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PROJECT  SUMMARY 


The  work  reported  here  was  supported  by  the  Innovative 
Science  and  Technology  Office  (1ST)  of  the  Strategic  Defense 
Initiative  Organization  (SDIO)  and  was  administered  through  the 
Office  of  Naval  Research  (ONR)  under  Contract  No. 

N00014-85-K-0479 .  The  period  of  performance  was  1  June  to  31 
December  1985.  The  project  monitors  were  Dr.  James  Ionson  and  Dr. 
Dwight  Dustin  at  SDIO/ 1ST  and  Dr.  Edward  Wegman  at  ONR.  The  pro¬ 
ject  science  and  technology  agents  were  Dr.  Keith  Bromley  and  Mr. 
William  Micell  at  the  Naval  Ocean  Systems  Center  (NOSC) .  The 
project  manager  at  the  University  of  Dayton  (prime  contractor) 
was  Dr.  Eugene  Gerber,  and  the  principal  investigator  was  Dr. 
Steven  Gustafson.  The  program  technical  director  was  Dr.  H.  John 
Caulfield  at  the  University  of  Alabama  in  Huntsville. 

The  technical  program  covered  a  broad  range  of  basic 
research  in  tne  optical  processing  and  computing  areas. 

Technical  presentations  were  made  by  the  University  of  Dayton  and 
the  thirteen  subcontractors  at  special  meeting  sessions  in  San 
Diego,  CA  on  21  August  and  in  Washington,  DC  on  14  October  1985. 
For  the  final  report,  each  organization  was  asked  to  submit  a 
technical  abstract,  a  summary  (including  objectives,  description 
of  work  performed  and  results,  and  conclusions  and 
recommendations),  and  a  technical  discussion  which  could  consist 
of  papers  prepared  for  publication  on  the  program.  The 
University  of  Dayton  prepared  ten  candidate  papers  for  publica¬ 
tion  through  the  program,  and  at  least  this  number  were  prepared 
by  the  subcontractors.  Thus,  as  an  Intense  basic  research 
effort,  the  program  was  clearly  successful. 


ABSTRACT 


This  program  covered  a  broad  range  of  basic  research  in  the 
optical  processing  and  computing  areas.  A  majority  of  the  effort 
was  carried  out  by  13  subcontractors  (seven  universities  and  six 
industrial  organizations) .  The  performing  organizations  and  their 
technical  contributions  were  as  follows:  (1)  Aerodyne  Research 
Inc.,  optical  parallel  2-D  neighborhood  processor  and  optical 
processor  assessment  technique;  (2)  University  of  Alabama  in 
Huntsville,  high  accuracy  with  moderately  accurate  components  and 
optical  fredkin  gate  architectures;  (3)  Battelle  Columbus 
Laboratories,  integrated  optical  threshold  computing,  pipelined 
polynomial  processor,  and  all-optical  analog/digital  converter; 

(4)  BDM  Corporation,  adaptive  optical  associative  memory  model 
with  attention;  (5)  California  Institute  of  Technology,  effec¬ 
tiveness  of  parallelism  and  connectivity  in  optical  computers; 

(6)  University  of  California,  Irvine,  optical  systolic  array  pro¬ 
cessing  using  an  integrated  acoustooptic  module;  (7)  University 
of  Dayton  Research  Institute,  optical  threshold  elements  and  net¬ 
works,  holographic  threshold  processors,  adaptive  matched  spatial 
filtering,  and  coherence  theory  in  optical  computing;  (8) 
Carnegie-Mellon  University,  time-varying  optical  processing  for 
sub-pixel  targets,  optical  Kalman  filtering,  and  adaptive  matched 
filtering;  (9)  Georgia  Institute  of  Technology,  optical  degrees 
of  freedom,  ultra-short  optical  pulses,  number  representations, 
content-addressable-memory  processors,  and  integrated  optical 
Givens  rotation  devices;  (10)  Probe  Systems  Inc.,  optical  J-K 
flip-flop  analysis  and  interfacing  for  optical  computers;  (11) 

RGB  Associates,  matrix  multiplication  algorithms  and  limits  of 
incoherent  optical  computers;  (12)  Science  Applications 
International  Inc.,  architecture  for  machine  vision  with  sensor 
fusion,  pattern  recognition  functions,  and  neural  net  implemen¬ 
tations;  (13)  University  of  Southern  California,  optical  com¬ 
puting  algorithms,  architectures,  and  components;  (14)  Stanford 
University,  dynamic  optical  interconnections,  advantages  and 
architectures . 
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TITLE: 

ADAPTIVE  OPTICAL  COMPUTING 

ABSTRACT 

A  model  for  optical  associative  memory  is  described  that  incorporates 
attention.  This  attention  is  in  the  form  of  increased  pre-disposition 
towards  certain  memory  states.  Furthermore,  the  new  formulation  of  opti¬ 
cal  associative  memory  allows  for  nonlinear  processing  in  the  correlation 
domain,  thereby  increasing  robustness  to  various  errors.  Results  of 
computer  simulation  are  presented  and  the  design  of  an  optoelectronic 
testbed  is  described. 
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SUMMARY 
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A.  OBJECTIVES 


The  concept  of  associative  storage  of  data  is  an  attractive  one  for 
parallel  systems,  such  as  optical  processors.  In  an  associative  data 
storage,  information  is  retrieved  not  by  location,  but  by  another  piece  of 
information  that  could  be  distinct  from  the  stored  information  (hetero- 
associative)  or  could  be  the  same  as  the  stored  information  (auto-associa¬ 
tive).  The  objective  of  this  contract  was  to  look  at  different  formula¬ 
tions  of  associative  memory  and  find  one  that  is  suitable  for  optical 
implementation.  A  further  objective  of  the  program  was  to  introduce  more 
versatility  into  the  perfromance  characteristics  of  the  associative  data 
storage. 

B.  DESCRIPTION  OF  WORK  PERFORMED  AND  RESULTS 


The  models  of  associative  memory  proposed  in  the  literature  involve 
storing  the  data  in  a  matrix  form  by  summing  the  rank-one  matrices  that 
are  results  of  outer  product  operations  between  the  data  vectors  to  be 
stored  and  the  key  vectors  that  are  to  be  used  during  retrieval.  Such  a 
method  produces  a  delocalized  storage  of  data  in  which  one  datum  is  stored 
at  several  locations  and  several  sites  collaborate  in  storing  one  piece  of 
information.  This  features  results  in  a  system  that  is  very  resistent  to 
failure  of  hardware  components.  Such  a  system  is  also  relatively  immune 
to  noisy  and/or  incomplete  retrieval  vector.  Such  systems  have  been  under 
investigation  for  last  20  years.  Recently  Psaltis  and  Farhat  hav*> 
proposed  an  optical  implementation  of  one  particular  model  of  auto-associ¬ 
ative  memory  developed  by  Hopfield. 

A  closer  inspection  of  the  equations  for  the  Hopfield  model  revealed 
that  it  can  be  decomposed  into  a  two-step  process:  (i)  compare  the  input 
vector  with  the  stored  set  of  data  vectors,  (ii)  calculate  a  linear  super¬ 
position  of  the  stored  vectors  with  the  similarity  measures  calculated  in 
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step  (i)  as  the  coefficients.  This  procedure  is  then  iteratively  applied 
with  a  hardclipping  nonlinearity  in  between  to  binarize  the  retrieved 
vector  (the  stored  vectors  are  known  to  be  binary).  We  then  proposed  a 
model  of  associative  memory  where  these  two  steps  are  explicitly  carried 
out  via  two  vector-matrix  multiplications.  This  allows  one  to  weight  the 
data  vectors  differently  in  the  correlation  (or  inner  product)  domain  as 
well  as  introduce  a  nonlinearity  to  suppress  cross-talk.  The  nonuniform 
weighting  corresponds  to  introducing  ATTENTION  in  the  associative  memory, 
and  hence  the  name  ATTENTIVE  ASSOCIATIVE  MEMORY. 

In  this  six  month  period,  we  performed  computer  simulations  of  the 
basic  model  of  attentive  associative  memory  (AAM).  The  simulations  showed 
that  the  AAM  is  capable  of  storing  vectors  that  are  highly  correlated  and 
yet  retrieve  them  without  error.  It  also  demonstrated  that  by  nonuniform 
weighting  one  can  retrieve  a  vector  that  is  weaker  as  compared  to  another 
vector.  An  optoelectronic  testbed  was  also  designed  that  requires  off- 
the-shelf  components.  These  results  were  presented  at  a  post-deadline 
presentation  at  the  1985  Annual  Meeting  of  the  Optical  Society  of  America. 
The  fabrication  of  the  optoelectronic  testbed  was  completed  using  internal 
BDM  funds.  The  testbed  performed  in  a  manner  "onsistent  with  the  computer 
simulations. 

C.  CONCLUSIONS  AND  RECOMMENDATIONS 

It  was  concluded  that  an  alternate  formulation  of  associative  memory 
that  does  not  involve  the  delocalized  storage  via  outer  products 
introduces  additional  flexibility  in  the  operation  of  the  memory.  This 
flexibility  can  be  utilized  to  incorporate  ATTENTION  or  predisposition  in 
the  stored  states  as  well  as  to  introduce  additional  nonlinearities  for 
cross-talk  suppression. 

The  attentive  associative  memory  can  be  extended  to  hetero-associa- 
tive  data  storage.  This  new  direction  can  have  important  consequences  for 
data  base  systems.  The  vector-matrix  multiplier  required  by  this  archi¬ 
tecture  can  be  implemented  in  a  compact  optical  system  by  properly 
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designed  spatial  light  modulators.  An  important  aspect  of  the  associative 
memory  is  the  training  of  the  system.  A  very  fruitful  direction  of 
research  will  be  the  study  of  different  learning  mechanisms  as  applied  to 
attentive  associative  memory. 
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ATTENTIVE  ASSOCIATIVE  MEMORY  AND  ITS  OPTICAL  IMPLEMENTATION 

RAVINORA  A.  ATHALE,  HAROLO  H.  SZU,*  AND  CARL  B.  FRIEDLANDER 

THE  BDM  CORPORATION 
7915  JONES  BRANCH  DRIVE 
McLEAN,  VA.  22102 

*US  NAVAL  RESEARCH  LABORATORY 
CODE  5709 

WASHINGTON,  DC.  20375 


ABSTRACT 

A  mathematical  model  for  incorporating  ATTENTION  in  the  conventional 
associative  memory  is  described.  Such  a  mechanism  provides  the  flexibil¬ 
ity  of  changing  rapidly  the  strengths  of  the  stored  states  in  an 
associative  memory.  This  Attentive  Associative  Memory  can  be  implemented 
optically.  Results  obtained  with  computer  simulation  and  an  optoelec¬ 
tronic  testbed  with  off-the-shelf  components  will  be  discussed. 
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A.  INTRODUCTION 


The  optical  processors  have  the  unique  features  of  massive  parallel¬ 
ism,  flexible  and  global  interconnection,  and  ease  of  performing  analog 
multiplication  and  addition.  Over  the  years  a  number  of  architectures 
based  on  Fourier  transform  and  convolution/correlation  operation  have  been 
developed.  In  the  last  two  years,  neural  network  models  have  excited  the 
immagination  of  the  optical  processing  community  as  a  source  of  inspira¬ 
tion.  The  pioneering  work  of  Psaltis  and  Farhat.l  and  Fisher,  Giles,  and 
Lee^  has  been  followed  by  an  increased  activity  as  evidenced  by  the 
special  symposium  on  Associative  Memories  and  Optics  at  the  1985  Annual 
Meeting  of  the  Optical  Society  of  America. ^ 

In  this  letter  we  will  modify  the  conventional  formulation  of  associ¬ 
ative  memory  to  incorporate  attention,  or  increased  sensitivity  to  the 
presence  of  a  given  stored  data  vector.  We  will  first  describe  one 
version  of  the  associative  memory  formulated  in  terms  of  linear  algebra, 
then  we  will  discuss  the  proposed  modification  along  with  its  conse¬ 
quences.  The  results  of  a  computer  simulation  will  be  discussed  next 
followed  by  the  design  of  an  optoelectronic  testbed  and  experimental 
results  obtained  with  it.  Finally  we  will  discuss  some  directions  for 
further  work. 

B.  MATHEMATICAL  FORMULATION  OF  ATTENTIVE  ASSOCIATIVE  MEMORY 


The  simplest  model  of  an  associative  memory,  which  is  designed  to 
store  N-dimensional  column  vectors,  contains  two  steps:  the  first  step  is 
the  recording  of  the  set  of  P  input  vectors  in  an  N  X  N  memory  matrix  via 
the  outer  product  operation  between  the  input  vectors,  and  the  second  step 
is  retrieving  the  data  vector  from  an  incomplete  and/or  noisy  version  of 
the  vector  itself  via  a  vector-matrix  multiplication.  These  two  steps  are 
described  mathematically  in  the  following  equation:^ 
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M  =  IvO')v(i)T  RECORDING 

v  =  M  v*  RETRIEVAL  (1) 
vj  3  S  Mjk  vk ' 

where  v  is  an  N-dimensional  column  vector,  M  is  an  N  X  N  matrix,  v'  is  the 
imperfect  recall  vector,  and  v(i)  is  one  of  the  stored  vectors.  The  last 
part  of  Equation  (1)  can  be  rewritten  by  a  substitution  for  values  of  Mjk 
derived  from  the  first  part  of  Equation  (1): 

vj  =  SOivj11  vji!)  vk'  (2) 

The  attentive  associative  memory  formulation  can  be  obtained  by  changing 
the  order  of  summation  Equation  (2)  and  inserting  a  nonuniformly  nonlinear 
operation  after  the  first  summation.  The  resultant  equation  is  given 
below: 

vj  -  Svj1)  XOfiXvjW  vk*)]  (3) 

This  equation  states  that  the  imperfect  input  vector  v'  is  first  compared 
to  all  the  stored  vectors  in  parallel  via  an  inner  product,  the  resultant 
scalar  is  transformed  using  channel -dependant  nonlinearity  A  (i)  and  then 
used  as  a  coefficient  in  a  linear  superposition  of  the  corresponding 
stored  vectors.  The  output  is  thus  an  estimate  of  the  stored  vector  that 
is  closest  to  the  input  vector  v ' .  The  nonlinear  operation  A(i)  allows 
one  to  suppress  spurious  correlations  and  to  emphasize  the  similarity  of 
the  input  with  a  selected  vector  (i.e.  focusing  attention  on  that 
particular  vector). 

This  basic  model  of  associative  memory  can  be  modified  in  numerous 
ways,  some  of  them  discussed  in  Reference  4.  In  Reference  5  Hopfield 
suggests  an  iterative  procedure  where  the  estimate  of  the  retrieved  vector 
(v  in  Equation  (1))  is  fed  back  to  calculate  an  improved  estimate.  In 
that  work,  the  data  vectors  were  chosen  to  be  binary,  and  this  knowledge 
was  used  in  hardclipping  the  retrieved  vector  before  feeding  it  back  in  to 
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the  system.  Reference  1  discusses  the  optical  implementation  of  the  Hop- 
field  model  via  an  optical  vector-matrix  multiplier  with  feedback  and  a 
threshold  nonlinearity  in  the  feedback  loop.  The  same  procedure  can  be 
applied  to  the  attentive  associative  memory  model  described  in  Equation 
(3)  to  improve  the  quality  of  retrieval. 

In  Reference  1  an  extension  of  the  Hopfield  model  to  storage  of 
images  was  proposed.  Since  images  are  2-D  matrices,  their  outer  products 
will  give  us  a  4-D  tensor  (corresponding  to  the  recording  step  in  Equation 
(1)).  To  facilitate  the  realization  of  this  tensor.  Reference  3  proposed 
an  optical  system  that  is  equivalent  to  Equation  (3)  in  that  it  also 
performs  the  inner  product  between  the  images  first  before  forming  the 
linear  superposition  of  the  stored  images.  The  system,  however,  did  not 
involve  the  nonlinear  step  contained  in  Equation  (3).  A  recent  paper  by 
Soffer,  et  al,6  discusses  a  holographic  implementation  of  the  associative 
memory  for  storing  images,  in  which  the  correlation  between  the  input  and 
the  stored  images  was  subject  to  a  nonlinear  operation.  That  system  did 
not  contain  provision  for  a  channel  dependant  nonlinearity  and  therefore 
for  attention. 


i 


C.  IMPLEMENTATION  OF  ATTENTIVE  ASSOCIATIVE  MEMORY 

The  block  diagram  of  an  attentive  associative  memory  with  iterative 
retrieval  is  shown  in  Figure  (1).  It  consists  of  two  vector-matrix  multi¬ 
pliers  (VMM)  that  are  connected  in  a  loop  with  nonlinear  processing  steps 
between  both  of  them.  The  first  VMM  projects  the  input  vector  onto  the 
space  spanned  by  the  stored  vector.  After  the  projection  values  (or  the 
correlations)  are  nonlinearly  processed,  a  reverse  projection  is  performed 
bringing  the  vector  back  from  projection  space  to  the  data  space.  This 
new  vector  is  now  subject  to  a  nonlinearity  that  reflects  our  a  priori 
knowledge  about  the  nature  of  the  data  vector.  This  processed  vector  is 
again  projected  on  the  space  spanned  by  the  data  vectors  and  the  procedure 
is  repeated  till  a  stable  state  is  reached. 


it 
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Figure  1.  The  block  diagram  of  an  optical  attentive  associative  memory. 
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a.  Computer  Simulations 

The  performance  of  this  simplest  model  of  attentive  associative 
memory  was  tested  on  a  personal  computer.  A  4  X  4  binary  image  was 
converted  into  a  16-element  binary  vector  and  used  as  the  basic  datum.  A 
set  of  four  images  was  chosen  for  storage.  Figure  (2)  shows  the  four 
binary  images  along  with  their  auto-  and  cross-correlations.  It  can  be 
seen  that  these  images  are  highly  correlated  with  the  auto-to-cross  corre¬ 
lation  ratio  of  only  2.  The  cross-correlations  betwwen  all  possible  pairs 
were  also  found  to  be  uniform  and  equal  to  4.  This  rather  large  cross¬ 
talk  implies  that  the  conventional  associative  memory  models  will  have 
difficulties  during  retrieval,  since  the  simple  model  shown  in 

Equation  (1)  works  well  with  orthogonal  or  near  orthogonal  set  of  vectors 
only.  Knowing  the  level  of  cross-talk  inherent  in  the  data  allows  us  to 
choose  the  appropriate  nonlinear  transformation  (i)  shown  in  Figure  (3). 
It  contains  an  offset  (chosen  to  be  4  to  suppress  cross-talk),  a  linear 
portion,  and  saturation  to  a  value  of  1  when  the  correlation  exceeds  8. 
In  effect  we  have  forced  orthogonality  on  the  data  set  by  incorporating 
the  nonlinearity  shown  in  Figure  (3).  The  a  priori  knowledge  of  binary 
nature  of  stored  vectors  is  utilized  by  hardclipping  the  retrieved  vector 
before  feeding  it  back  into  the  processor. 

The  results  of  a  computer  simulation  are  shown  in  Figure  (4). 

The  conventional  associative  memory  failed  to  work  even  when  a  noise-free 
vector  was  available  for  retrieval.  On  the  other  hand,  it  was  noted  that 
as  many  as  three  bits  out  of  the  nonzero  bits  of  the  stored  data  could  be 
missing  and  the  attentive  associative  memory  will  still  reconstruct  the 
correct  data  vector.  The  hardclipping  in  addition  to  the  threshold 
nonlinearity  in  the  correlation  domain  gave  a  rapid  convergence  with  the 
correct  vector  being  retrieved  in  a  single  step. 

Since  the  threshold  nonlinearity  essentially  created  an  ortho¬ 
gonal  set  of  vectors,  another  scheme  of  orthogonalyzing  the  vectors  was 
studied.  The  vectors  in  Figure  (2)  were  converted  into  bipolar  vectors  by 

replacing  all  0's  with  -l's.  The  resultant  set  of  four  bipolar  images  are 

shown  in  Figure  5(a).  Since  these  images  are  orthogonal,  we  do  get  a 


STORING  FOUR  16-ELEMENT  VECTORS  AND  RETRIEVING  THEM: 


£  THE  BDM  CORPORATION 


y 


o  o© 


O  00  O 
O  00  o 
©o  o© 


O  ©  O  © 
©  O  ©  O 

°  ©  °  © 

©o©o 


o  o  o  o 


o  o  o  o 


00 

«■ 

00 

00 

00 

©  o©  o 
©  o©  o 
©  o©  o 
©  O  ©  o 


Figure  2.  Four  binary  images  to  be  stored  and  their  mutual  inner  products 
(correlations) . 
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Figure  3 


OUTPUT 


INPUT 


.  The  nonlinear  transfer  function  applied  to  the  inner  products 
(corresponds  to  i(k)  in  Figure  1). 
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Figure  4.  Performance  of  a  conventional  associative  memory  and  an  attentive 
associate  memory  in  recalling  the  images  shown  in  Figure  2. 
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perfect  retrieval  with  the  conventional  associative  described  in  Equation 
(1)  when  the  complete  input  vector  is  presented.  Figure  5(b)  shows  the 
results  of  the  retrieval  when  the  input  contained  three  bits  of  error  (-1 
instead  of  +1).  In  the  case  of  the  first  example  error-free  retreival  was 
achieved.  The  second  example,  however,  shows  that  a  different 
distribution  of  the  errors  will  lead  to  an  incorrect  retrieval.  This 
proves  that  with  an  orthogonal  set  of  vectors,  the  cross-correlations  can 
be  significantly  different  for  the  same  number  of  bit-errors.  This  is  not 
the  case  with  achieving  orthogonality  through  a  nonlinear  step  in  the 
correlation  domain. 

b.  Optical  Implementation 

Figure  (I)  indicated  that  the  attentive  associative  memory  con¬ 
tains  two  vector-matrix  multipliers  connected  in  a  loop  with  nonlineari¬ 
ties  in  between  them.  The  matrices  in  both  the  multipliers  were  however 
identical.  This  fact  can  be  exploited  to  design  an  optical  attentive 
associative  memory  with  bi-directional  propogation  of  light  and  a  common 
matrix  mask.  A  compact  structure  can  be  realized  by  using  long  finger¬ 
like  modulators  and  detectors  to  perform  the  operation  of  broadcasting  and 
summing,  respectively,  that  are  needed  in  vector-matrix  multipliers.  The 
schematic  diagram  of  such  a  compact  architecture  is  shown  in  Figure  (6). 
The  active  part  of  the  system  consists  of  an  optoelectronic  panel  contain¬ 
ing  pairs  of  detectors  and  light  modulators  that  are  electrical  connected 
to  each  other  through  an  amplifier  and  a  nonlinear  circuit.  The  current 
out  of  the  detector  stripe  is  proportional  to  the  sum  of  the  light  distri¬ 
bution  on  it.  That  signal  is  then  amplified  and  processed  before  applying 
it  to  the  light  modulator  stripe,  which  then  broadcasts  it  to  its  entire 
length  uniformly.  The  system  shown  in  Figure  (6)  is  designed  to  store 
three  vectors,  each  with  four  elements.  Thus  the  input  panel  contains 
four  pairs  of  detector-modulator  stripes,  each  three-elements  long.  The 
other  panel  is  placed  in  the  correlation  domain  and  contains  three  pairs 
of  detector-modulator  stripes,  each  four-elements  long,  that  are  orthogo¬ 
nally  oriented  with  respect  to  the  input  panel  stripes.  The  matrix  mask 
sandwitched  between  the  two  panel  contains  the  three  four-element  vectors. 
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Figure  5 


A.  ORTHOGONAL  DATA  VECTORS 
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B.  THREE  ERRORS  IN  THE  FIRST  VECTOR 
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The  retrieval  characteristics  of  a  conventional  associative  memory 
with  orthogonal  data  set. 
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Figure  6.  Compact  optical  architecture  for  an  attentive  associative  memory 
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The  initial  vector  can  be  applied  to  the  input  panel  via  an  optical  signal 
to  the  photodetector  or  via  an  electrical  signal  to  the  modulator.  This 
vector  is  then  broadcast  to  all  of  the  vectors  of  the  matrix  mask  via  the 
stripe  modulator.  The  transmitted  light  contains  the  element-by-element 
multiplication  between  the  initial  vector  and  all  the  stored  vectors.  The 
stripe  detector  in  the  correlation  domain  now  sums  the  products  along  a 
row  thus  performing  a  vector-vector  inner  product.  These  inner  product 
(correlation)  results  are  nonlinearly  amplified  and  applied  to  the  modula¬ 
tors,  which  broadcast  them  to  the  corresponding  row  vector  in  the  matrix 
mask.  The  backward  propagating  light  now  performs  a  scalar-vector  multi¬ 
plication  per  channel.  The  detectors  in  the  input  pannel  now  perform  a 
weighted  sum  of  all  the  stored  vectors,  thus  calculating  a  new  estimate  of 
the  initial  vector.  The  detector  outputs  are  nonlinearly  amplified  before 
driving  the  modulator  stripes,  at  which  point  the  cycle  repeats. 

The  system  shown  in  Figure  (6)  can  be  simulated  with  discrete 
off-the-shelf  components,  such  as  LED's  and  photodetectors.  The  schematic 
diagrams  of  the  input  and  the  correlation  plane  panels  consisting  of  LED's 
and  photodetectors  is  shown  in  Figure  (7).  Three  discrete  photodetectors 
connected  in  parallel  replace  the  stripe  detector  and  three  LED's 
connected  in  series  replace  the  stripe  modulator.  An  electronic  amplifier 
module  per  channel  implements  the  desired  nonlinear  amplification  of  the 
signal.  An  optoelectronic  testbed  capable  of  storing  four  16-bit  vector 
was  fabricated.  Figure  (8)  shows  the  photgraph  of  the  finished  unit.  The 
initial  vectors  can  be  input  via  16  potentiometers  on  the  front  panel. 
The  offset  and  gain  in  the  correlation  domain  for  each  of  the  four  stored 
vectors  can  also  be  controlled  via  8  potentiometers  on  the  front  panel. 
One  control  adjusted  the  threshold  level  of  the  hardclipping  operation  in 
the  input  panel.  A  film  mask  was  prepared  encoding  the  vectors  shown  in 
Figure  (2).  The  operation  of  this  unit  was  tested  and  results  consistent 
with  the  computer  simulations  were  obtained. 
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D.  FUTURE  WORK 


Further  work  on  this  project  will  proceed  on  several  fronts.  Replac¬ 
ing  the  hardclipping  step  with  an  offset-and-gain  type  of  nonlinearity  and 
making  the  offset  adaptive  is  the  first  step  that  will  be  tried.  The 
second  most  important  step  will  be  the  study  of  the  scaling  issues 
involved  in  the  optical  implementation  of  the  system  shown  in  Figure  (6) 
along  with  a  study  of  candidate  spatial  light  modulator  designs.  An 
extension  of  the  attentive  associative  concept  to  a  hetero-associative 
memory  will  be  the  next  step.  Making  the  stored  vectors  adapt  in  real 
time  will  require  replacement  of  the  film  mask  with  a  real  time  spatial 
light  modulator.  A  study  of  different  algorithms  for  incorporating  learn¬ 
ing  in  the  attentive  associative  memory  will  be  useful  in  determining  the 
direction  of  that  aspect. 
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OPTICAL  COMPUTING  STRATEGIES 
ABSTRACT 


The  research  effort  was  devoted  to  two  topics  in  optical 
computing.  The  first  effort  is  of  a  very  theoretical  nature 
being  an  investigation  into  the  inherent  computational  limits 
of  incoherent  optical  computing  systems  in  terms  of  a  lower 
bound  on  the  simultaneous  volume  and  computing  time  resources 
of  the  system.  This  work  is  a  generalization  of  the  VLSI 
approach.  The  second  effort  is  the  development  of  a  new 
algorithm  for  the  multiplication  of  two  rectangular  matrices 
specifically  designed  to  take  full  advantage  of  the  fact  that 
convolutions  can  be  performed  very  rapidly  using  electro¬ 
optic  and  acousto-optic  technology. 
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TECHNICAL  SUMMARY 


OBJECTIVES 

1.  Development  of  an  algorithm  for  the  multiplication 
of  two  rectangular  matrices  via  convolution. 

2.  Development  of  a  tractable  mathematical  model  of 
an  "optical"  computing  system  (assuming  incoherent  light 
operations) ;  use  of  such  a  model  to  investigate  the  inherent 
limits  of  computation  in  terms  of  a  lower  bound  on  the 
simultaneous  resources  of  volume  and  computing  time. 


DESCRIPTION  OF  WORK  PERFORMED 

Some  of  the  research  effort  is  written  up  as  two 
independent  reports  (probably  to  be  published)  : 

1.  An  Algorithm  for  Matrix-Matrix 
Multiplication  via  Convolution. 

2.  Lower  Bounds  on  the  Computational 
Efficiency  of  Optical  Computing  Systems. 


These  reports  are  appended. 


CONCLUSIONS  AND  RECOMMENDATIONS 

With  respect  to  the  matrix  algorithm,  we  feel  that  it 
has  the  potential  to  be  very  useful.  As  such  we  recommend 
that  it  be  implemented  by  some  group  with  appropriate 
equipment.  In  order  to  conclude  the  theoretical  aspects 
of  the  algorithm,  an  additional  effort  should  be  carried 
out  to  assess  the  numerical  stability  of  the  algorithm. 

With  respect  to  the  computational  efficiency  problem, 
it  does  not  seem  worthwhile  at  this  point  in  time  to  continue 
until  more  is  known  about  device  technology. 


rr. 


out  convolution  operations  very  rapidly.  Given  this  technical  advantage,  it 


is  worthwhile  to  develop  an  algorithm  for  the  multiplication  of  two  rectangular 
matrices  using  convolution. 

^  A  A 

To  this  end  let  us  consider  the  matrix  product  C =  AB  where  A  is  of 

-  A 

size  n^  *  n^ ,  B  is  of  size  n^  *  n^ ,  and  C  is  of  size  n^  x  n^ .  Let  the 

corresponding  matrix  elements  be  a.  ,  b.,  ,  and  c  .  Associate  with  A  and 

13  ]k  lk 

B  the  polynomials  P(x)  and  Q(x),  with  x  being  interpreted  as  an  inde¬ 
terminate 

(n^-1) n^n^+n^-l 

P(x)  =  ^  Psxs  (1) 

s=0 


kv( 

& 


5 


•  - 


g(x)  = 

E 

t 

qtX 

• 

(2) 

t=o 

hot'. 

that  the  degree  of  P(x) 

involves  not  only 

the 

size 

of 

A  through 

n 

and 

n^  but  also  the  size  of 

B 

through 

v  1 

The  degree 

of 

Q  (x)  , 

on 

the 

othe 

r  hand,  involves  only  the 

size 

of  B , 

namely 

n2 

and 

n3. 

The 

P 

and 

d  c 

oefficients  are  related  to 

ti>e 

matrix 

elemen 

ts  of 

A 

and 

B  by 

P  =  a  , 

s  13 

if  s 

=  ( i  —  1 

)n2n3 

+  j  - 

1 

(3a 

=  o 

if  (i 

-1)  n  n 

,  + 

$  s  ^ 

inn 

(3b 

V- 

1 


=  0  , 


if  t  £  n2n3 


with:  and  l^k^n^. 

We  claim  that  the  elements  of  the  matrix  product  C  are  given  by 
selected  coefficients  of  the  polynomial 


R(x)  =  P (x)Q(x) 
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A  formal  proof  (which  is  really  a  verification  of  the  formulae)  is  now 
given.  We  begin  by  rewriting  Eq.  (6)  in  the  form 
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( i-  1 )  <  s  <  (i-l)n2n^  +  n2 


t  =  m  -  s  =  kn2  -  j 


t  <  n2n3 


The  u  term  is  simply  Eq.  (3a) ,  while  the  8  term  is  the  negation  of  Eq. 
(3b).  The  y  term  follows  from  Eq.  (4a),  while  the  6  term  is  the  negation 
of  Eq .  (4b).  Upon  substitution  of  the  a  term  into  the  8  inequality,  we 
immediately  see  that  this  can  only  be  true 


1  ^  j  ^  n. 


In  like  fashion,  substitution  ol  the  y  term  into  the  6  inequality  leads 
to  the  requirement  that 


m  =  ( i-i) n2n3  +  kn  ;  -  1 


which  is  Eq.  (7).  Thus  the  formulae  are  verified. 

A  construction  which  leads  to  the  various  formulae  for  p  and  q  in 

s  nt 

terms  of  a  and  b  ,  respectively  uses  row  vectors.  Consider  a  row 
i]  ]k 

vector  p  whose  elements  we  denote  by  (coefficients  of  the  polynomial 

P  (x)  )  composed  of  the  matrix  elements  a^_,  of  A  and  strings  of  zeros 
as  depicted  in  Fig.  1A.  The  range  of  s  is 


V  5  ^  n  n  si  -  ii  n.  t  n  -  1 
i  -  J  3  l 


■r.  ■-  v.  r.  /. 


see  (A)  ,  and  the  q  vec 


%  *1 


consequently 


P  -  0  , 
s 


=  0  , 


s  3*  (n  -1) n^n^  + 


if  s  ^  ln2n3 


(13b) 


furthermore  the  p  are  related  to  the  a. .  as  given  by  Eq.  (3a),  as  the 
s  1]  J 

reader  can  verify  by  construction. 

In  like  fashion,  we  construct  another  new  row  vector  q  with  elements 
q  according  to  Fig.  IB.  Unlike  p,  q  has  no  strings  of  zero  elements. 

The  range  of  t  is 

0  <  t  <  n2n3  "  1  (14) 


so  that 


qt  d,  if  t  >  n2n3 


Within  the  range  of  t,  the  q.  are  related  to  the  b  ,  by 

t  i  k 


q  =  b 
4t  3  k 


winch  reduces  to  Eq.  (4a) . 


if  t  =  ( k— 1 ) n2  +  n2  -  ] 


As  an  illustrative  example  of  tlie  algorithm,  consider  the  case  where  A 

is  2*2,  B  is  2*3  so  that  C  is  2*3  (i .  e.  ,  n  =  2 ,  n2  =  2 ,  n^  =  3)  . 

The  upper  limits  on  the  polynomials  P,  Q  and  R  are  7,  5,  and  11, 

respectively.  The  p  ,  q  and  r  coefficients  evaluated  according  to 

s  c  m 

Eqs .  (3),  (4)  and  (7)  are  listed  in  Table  1.  Upon  carrying  out  the  convolu¬ 

tion  operation,  Eq.  (6),  in  con3unction  with  this  table  we  have: 
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rl  =  C11 


r 3  =  C12 


r  5  C13 


r7  =  C21 


r9  “  C22 


ril  C23 


>'oqi  +  piqo 


pOq3  +  Piq2 


P0q5  +  plq4 


p6ql  +  p7q0 


P6q3  +  P7q2 


poq5+P7q4 


a  b  +  a  b 
11  11  12 21 


allb12  +  a!2b22 


ailb13  +  a!2b23 


a21bll +  a22b21 


a21b12  +  a22b22 


a21b13  +  a22b23 


(17a) 

(17b) 
(17c) 
(17d) 
( 17e) 
( 17  f ) 


These  are,  of  course,  the  matrix  elements  as  obtained  by  more  standard 
procedures . 

The  implementation  of  the  algorithm  can  be  carried  out  in  a  straight¬ 
forward  fashion  by  re-examination  of  Figs.  1A  and  IB.  Note  that  the  row 
vector  p  in  Fig.  1A  consists  of  the  rous  of  A  in  which  zeros  are  inter¬ 
spaced,  the  number  of  zeros  is  fixed.  Thus  we  can  easily  handle  the  vector 
i  containing  the  matrix  elements  a  .  The  row  vector  q,  containing  the 
matrix  elements  ,  15  the  JO  turns  of  B  in  reverse  order,  see 


Fig.  IB.  This  vector  is  also  easily  handled  in  the  implementation. 
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SECTION  TWO 

LOWER  BOUNDS  ON  THE  COMPUTATIONAL 
EFFICIENCY  OF  OPTICAL  COMPUTING  SYSTEMS 


The  advent  of  Very  Large  Louie  Integrated  (VLSI)  circuitry  has  lead  to 
•  -Otis  (durable  d>  crease  in  the  physical  size  of  computers  with  a  corresponding 
increase  in  speed  of  execution  of  operations.  Basically  there  are  three 
interrelated  aspects  to  VLSI:  design  and  fabrication  of  the  chips,  design  of 
systems  which  use  these  chips  for  specific  applications,  and  development  of 
algorithms  which  utilize  the  inherent  capabilities  of  such  chips.  The  re¬ 
volution  in  computer  science,  for  both  numerical  and  nonnumerical  applications, 
brought  about  by  VLSI  continues  unabated. 

The  computational  limitations  of  VLSI  were  first  investigated  by 

Thompson  [11.  For  an  introduction  to  this  work  see  the  basic  text  of 

bllman  [2]  which  contains  references  to  subsequent  work.  It  has  been  shown 

2 

that  any  VLSI  circuit  with  area  A  and  time  T  requires  at  least  AT  =  f;(n) 
to  solve  various  computational  problems  such  as  FFT,  convolution,  and  1 1  *  1 
matrix  multiplication  where  n=  4“.  The  symbol  Q  is  defined  in  Ullman: 
f(n)  =ft(g(n))  means  that  there  exists  a  positive  constant  c  such  that  for 
an  infinite  number  of  values  of  n  we  have  f(n)^cg(n). 

Nevertheless,  VLSI  suffers  from  the  limitation  that  the  technology  upon 
which  it  relies  is  inherently  two-dimensional.  Snyder's  recent  review  (3J 
contains  a  very  useful  discussion  of  the  constraints  imposed  by  VISI  as  regards 
planarity.  In  particular  conventional  VLSI  chips  are  constructed  by  super¬ 
posing  a  small  number  of  layers  on  top  of  a  substrate.  This  substrate  lias  a 
thickness  which  is  order  of  magnitude  greater  than  the  size  of  the  transistors 
and  wire  width.  Input  and  output  from  a  conventional  VLSI  chip  must  be  made 


A 

** 


< 

8 


•7. 


1  •. 
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U. 


V  . 
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by  a  limited  number  of  pads  located  on  the  sides  of  the  chip.  VLSI  chip  tech¬ 
nology  is  changing  almost  daily;  however,  some  of  the  more  basic  aspects  are 
discussed  in  Barbe  [41  and  Einspruch  [5].  Although  an  ensemble  of  two- 
dimensional  chips  can  be  placed  on  top  of  each  other  with  holes  drilled  down 
through  them  for  interchip  communication,  the  total  number  of  layers  is 
seriously  limited  by  the  substrate  thickness  of  each  chip:  consequently  the 
resulting  device  cannot  properly  by  termed  "three-dimensional  VLSI".  For  this 
reason,  it  appears  that  truly  three-dimensional  VLSI  will  most  likely  not  be 
possible  to  fabricate.  Nevertheless  some  interesting  theoretical  investiga¬ 
tions  of  three-dimensional  VLSI  have  been  carried  out:  Rosenberg  [6], 

Leighton  and  Rosenberg  [7], 

The  purpose  of  the  present  communication  is  to  summarize  investigations 
into  various  aspects  of  the  computational  performance  of  three-dimensional 
devices  which  make  hybrid  use  of  electronic  and  optical  components  to  perform 
operations.  Our  goal  is  to  facilitate  general  statements  on  such  electro- 
optical  computations  with  specific  reference  to  lower  bounds  on  their  com¬ 
plexity.  Since  such  devices  may  contain  a  large  number  of  components,  we 
term  them  VLSIO,  with  the  0  denoting  optics. 

We  note  that  a  very  useful  overview  of  optical  computing  (more  properly 
electro-optical  computing)  may  be  found  in  Caulfield  nt  al.  [8). 

In  order  to  carry  out  such  an  analysis  we  outline  the  development  of  an 
abstract  model  of  VLSlo  which  is  essentially  technology  independent  but 
incorporates  the  physical  restrictions  of  light  beam  propagation  as  expounded 
by  Cabor  [ D ) ,  especially  with  respect  to  the  very  important  fact  that  the 


R 
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amount  of  information  passing  through  a  cube  of  small  volume  is  bounded.  Thi 
physical  constraint  allows  us  to  adapt  previous  VLSI  lower  bound  arguments  to 
the  VLSIO  situation  and  allows  for  comparisons  of  electro-optical  computing 
devices  in  terms  of  their  volume  V  and  the  time  T  taken  by  VLSIO  on  a 
given  input  (=  number  of  time  units  that  elapse  from  the  first  input  signal 
until  the  last  output  signal) .  We  avoid  making  assumptions  about  the  precise 
physics  of  the  devices  utilized.  This  would  only  limit  the  later  application 
of  these  ideas  as  the  physical  models  are  improved  and  modified.  Optical 
physics  (through  Gabor's  theorem)  implies  an  upper  limit  on  the  rate  of  in¬ 
formation  transfer  across  an  optical  beam,  and  hence  a  lower  bound  on  com¬ 
putational  efficiency  of  VLSIO.  In  addition  we  assume  that  any  2-D  convolu¬ 
tion  of  an  n*n  array  of  points  can  be  achieved  by  a  VLSIO  device  in  unit 
time  step.  This  assumption  is  reasonable  because  there  already  exist  optical 
devices  which  perform  thusly. 

Note  that  all  the  variables  and  functions  are  taken  to  be  Boolean  (i.e., 
the  values  of  the  variables  are  taken  from  ( 0 , 1 } ) . 

We  Degin  by  discussing  the  well  known  abstract  two-dimensional  model  of 
a  VLSI  chip  as  a  x  *  L^  grid  graph  with  height  (<<  L^  or  L.,)  held 

constant.  The  distance  between  grid  points  is  w,  the  feature  width.  The 
chip  processors  are  located  at  various  distance  nodes  of  the  grid  graph  with 
each  processor  storing  a  state  consisting  of  b  bits.  Furthermore  the 
processors  execute  synchronously  on  a  step  consisting  of  a  time  unit  of 
duration  i  seconds.  The  remain  inn  nodes  are  used  for  wire  routing,  or  for 
input  and  output  pods.  Lach  wire  can  run  along  a  patli  in  the  grid  graph  from 


an  input  pod,  or  a  processor,  to  various  output  pods,  or  processors.  Wires 
are  not  allowed  to  intersect.  On  each  time  step,  a  value  consisting  of  b 
bits  of  information  is  transmitted  across  the  wire  grid  from  either  an  input 
pod  or  a  processor.  The  state  of  each  processor  is  then  updated  on  each  step 
by  a  fixed  function  of  the  values  transmitted  by  the  wires  leading  into  the 
processor,  and  by  the  state  of  the  processor,  in  the  previous  step.  The  unit 
step  transmission  time  across  wires  is  justified  by  the  fact  that  wire  trans¬ 
mission  can  be  made  generally  faster  than  transistor  switching  times.  This 
remarkably  simple  model  is  sufficient  to  determine  the  computational  efficiency 
of  VLSI  devices. 

Following  the  two-dimensional  version,  the  fundamental  building  block  of 

our  VLSIO  device  is  the  optical  box  b.  It  is  a  parallelepiped  having  lengths 

1.,  ,  l,,  and  L.  with  input  and  output  faces,  F.  and  F  These  faces 

121  in  out 

are  assumed  to  take  as  input  and  as  output  two-dimensional  integer  arrays 
l(x,y)  and  o(x,y)  respectively.  For  convenience,  we  consider  the  input 
sources  and  output  detectors  to  be  very  small  compared  to  the  size  of  the 
optical  box  (in  order  to  minimize  optical  diffraction  effects) ,  furthermore 
they  are  uniformly  spaced  a  distance  w  apart.  The  input  sources  are  taken 
to  be  LED's  (laser  emitting  diodes)  and  the  detectors  are  unspecified  except 
to  state  that  they  are  sensitive  only  to  the  intensity  of  the  LED  radiation. 

We  remaind  the  reader  that  most  electro-optical  computations  are  now  performed 
via  incoherent,  geometrical  optics  based  processors  and  not  by  coherent, 

Fourier  transform  based  processors.  The  ancillary  optical  equipment  (lenses, 
prisms,  gratings,  etc.)  which  spread  and  then  collect  the  light  can  be 


,.'rVr.T 


neglected  in  this  version  ot  the  abstract  model. 

The  output  array  is  computed  on  each  time  step  with  a  duration  x  as  a 

fixed  function  A  of  the  input  array;  A  will,  of  course,  depend  upon  the 
B  B 

detailed  optical  characteristics  of  B. 

The  optical  box,  in  addition  to  being  three-dimensional,  also  differs 
from  VLSI  in  another  way;  namely,  optical  beams  rather  than  wires  provide 
storage  and  cross- flow.  Since  the  modus  operandi  is  incoherent  radiation, 
these  beams  can  intersect  without  interacting.  The  basic  question  that  now 
arises  is:  "to  what  extent  do  optical  (laser)  beams  behave  as  wires?" 

A  wire  can  only  transport  information  at  a  finite  rate  depending  upon 
wire  cross-section,  skin  effects,  etc.  We  would  also  expect  an  optical  beam 
to  perform  similarly  not  withstanding  the  greater  information  rate.  This 
problem  has  already  been  addressed  by  Gabor  [9]  who  studied  the  "metrical 
intormation"  in  a  light  beam.  The  conclusion  that  he  draws  is  that  a  light 
Lieam  always  has  a  j'i.nitc  upper  limit  with  respect  to  information  rates;  the 
upper  limit  depending  upon  wavelength  of  light,  smallest  effective  beam  area, 
solid  angle  of  divergence,  etc.  We  need  not  concern  ourselves  with  explicit 
formulae;  for  our  purposes  it  suffices  that  we  can  interpret  an  optical  beam 
as  a  wire. 

Given  this  equivalence,  we  turn  to  the  important  problem  of  determining 
lower  bounds  (in  terms  of  simultaneous  volume  and  time)  on  the  computational 
resources  required  for  VLS10  to  solve  various  problems. 

In  order  not  to  unduly  lengthen  the  text,  it  is  assumed  that  the  reader 
is  familiar  with  Sections  1.4,  2.1  and  2.2  of  Ullman's  basic  text  [2]. 


-  .  •  •  •  .NbW 


2-20 


Consider  a  Boolean  function  f  with  a  set  X  of  n  input  variables 

and  a  set  Y  of  m  output  variables.  Let  X'  be  a  subset  of  X;  also  let 

P=  (X  ,  X  ,  Y  ,  Y  )  where  X  ,  X  and  Y  ,  Y  are  partitions  of  X  and  Y 

L  H  L  H  LR  L  R 

respectively.  We  term  P  balanced  if  between  one-third  and  two-thirds  of  X* 
lies  in  X  and  note  it  by  P  .  If  a  and  3  are  two  input  assignments, 

D 


then  we 

term  them  a 

fooling  pair  of 

assignments  to  X 

if : 

1) 

output 

yl 

is  distinct  for 

input 

assignments 

a  (x) 

and 

a (X  )  8 (x  ) 

lu  R 

2) 

output 

Y 

R 

is  distinct  for 

input 

assignments 

8  (x) 

and 

a(XL)B(V 

In  addition,  let  the  fooling  set  for  P  be  a  set  of  assignments  A  of  X 
such  that  for  all  distinct  a,  8€a,  at  least  one  of  (a, 8),  (8,ct)  is  a 
fooling  pair. 

Finally,  we  require  that  the  locations  and  times  of  the  input  and  output 
are  given  only  once. 

Crucial  to  the  analysis  is  the  concept  of  information  content  (essentially 
"tin-  amount  of  information  that  must  cross  a  boundary  in  order  to  solve  the 
iroblem").  Formally  the  information  content  of  the  Boolean  function  f  is: 

max  min  max  log  ( | A | ) 

If  =  (1) 

X'  I’,  A 
b 

wiiLtc  A  denotes  the  fooling  set  corresponding  to  P  .  The  following 
functions  (of  importance  in  electro-optical  computing)  are  known  to  have 
information  content  [^  =  C(n)  : 

a)  n  point  discrete  Fourier  transforms. 

b)  multiplicat  ion  and  inversion  of  two  £  *  i.  matrices  where  n=  . 
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c)  n  point  convolution. 


The  following  important  result  on  lower  bounds  is  due  to  Thompson  [1,2] 

An.j  tvo-dimensi anal  VLSI  chip  cevut ing  a  Boolean  f. notion  f  requires 

2  2 

rimult.tneous  area  A  ,tnrf  t  in  t  satisfy  mg  AT  =  li(I  ) , 

We  now  prove:  Any  t hret-ai ".<  kj Lana L  "optical  box"  computing  a  boolean 


notion  f  requires  simultaneous  volume  V  anJ  time  t  satisfying 
vtj/2  =  d(i2/2). 

The  proof  (which  we  now  sketch)  is  an  adaptation  of  the  two-dimensional 


technique.  Let  the  device  be  a  paral le lepiped  having  dimensions  ^ 

with  volume  V  =  L^L^L^.  Choose  X'  to  be  the  subset  of  X  such  that 

I  =  l.(X').  For  each  i=  1,2,3  we  can  find  a  cut  C.  of  area 
ft  l 


A.  <  2V/L.  i  =  1  (2 

l  l 

which  disconnects  the  device  into  two  components  each  of  which  contains  at 
most  two-thirds,  but  no  less  than  one-third,  of  the  inputs  of  X'.  By 
definition  at  least  I  bits  must  be  transported  across  each  cut;  this 
requires  time 

T  >  pr  (2 

A. 

l 


Consequently 

VV  >  A  AjA^T3  >  I  y2 


(4 


which  is  the  sought-for  result.  The  mam  point  to  emphasize  is  that  this 


result  depends  upon  the  fact  that  we  can  treat  light  beams  as  if  they  were 
wires. 

An  immediate  consequence  of  this  theorem  is  that  the  lower  bounds  for 
optical  computing  are: 

a)  n  point  convolution  or  n  point  discrete  Fourier  transforms 


VT 


3/2 


-  a(n3/2) 


(6) 


2 

b)  multiplication  and  inversion  of  two  ^  2,  matrices  where  n  =  l 

vt3/2  =  na2)  .  (7) 

These  results  follow  from  the  statements  quoted  after  Eq.  (1) .  Equations  (6) 
and  (7)  represent  the  lower  bound  performance  of  these  two  operations  in 
terms  of  volume  and  time.  It  is  important  to  remember  that  these  bounds  are 
a  consequence  of  the  fact  that  we  allow  the  entire  volume  of  B  to  be 
operative. 
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ABSTRACT 

Our  thrust  is  to  apply  optical  data  processing  techniques  to  specific  SDI  problems, 
with  attention  to  the  presence  of  time-varying  input  data  and  the  processing  of  more 
than  one  input  frame  of  data.  The  specific  applications  defined  in  year  one  include: 
processing  of  time-varying  imagery,  detection  and  tracking  of  sub-pixel  targets,  optical 
Kalman  filtering,  and  adaptive  matched  spatial  filters.  In  each  of  these  areas,  new 


concepts,  ideas,  architectures,  and  algorithms  suitable  for  optical  data  processing  have 


TECHNICAL  SUMMARY 
1.  OBJECTIVES 


The  objectives  of  this  effort  are: 

1.  to  apply  existing  optical  data  processing  engineering  techniques  to  new  SDI 
applications,  and 

2.  to  define  and  develop  new  optical  data  processing  techniques  for  new  SDF 
applications. 

In  area  (1)  above,  we  include  optical  matrix-vector  processing.  However,  even  with  this 
new  research  area,  there  is  a  significant  amount  of  basic  research  concerning  algorithms, 
architectures,  and  number  representations  on  which  we  have  concentrated  our  work.  In 
area  (l),  we  also  include  optical  correlation.  New  work  in  this  area  concerns  new  smart 
filters  and  adaptive  filters.  In  area  (2),  we  have  addressed  time- varying  targets  in  which 
target  detection  decisions  based  upon  multiple  frames  of  data  rather  than  one  snapshot 
frame  of  data  is  required.  We  have  also  addressed  the  processing  of  sub-pixel  target 
objects.  These  represent  new  research  areas  for  optical  data  processing  that  are  quite 
suitable  for  SDI  applications. 
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2.  SUMMARY  OF  WORK  PERFORMED  AND 
RESULTS  OBTAINED 

We  highlight  our  results  below,  including  references  to  published,  submitted  and 
planned  papers  in  the  separate  topical  areas  associated  with  each  major  task.  All 
references  that  are  not  available  presently  in  the  open  literature  are  included  as 
appendices  to  this  report.  Brief  summaries  of  these  major  topical  areas  now  follow. 

2.1  OPTICAL  MATRIX- VECTOR  PROCESSING 

We  have  addressed  extended  Kalman  filtering  applications  in  our  matrix-vector 
studies.  Our  progress  has  been  excellent  in  this  area.  This  research  area  has  been  the 
task  that  has  been  given  major  emphasis  in  1985.  We  now  highlight  the  various  sub- 
areas  associated  with  this  task. 

2.1.1  Architectures 

A  laboratory  prototype  of  a  new  basic  matrix-vector  optical  architecture  [l]  using 
laser  diode  point  modulators  and  a  one-channel  AO  cell  with  frequency-multiplexing 
and  multiple  output  detectors  has  been  assembled.  This  architecture  is  quite  versatile. 
It  allows  bipolar  and  complex-valued  data  handling.  It  also  allows  the  use  of  analog 
processing  or  digital-encoded  DMAC  (digital  multiplication  by  analog  convolution) 
processing,  as  well  as  multi-level  DMAC  processing.  These  multiple  processing  modes 


are  all  achievable  on  the  same  processor.  This  is  a  quite  unique  feature  for  any 


architecture  (optical,  or  conventional  digital  multiprocessor).  Tests  and  quantitative 
data  on  the  performance  of  this  system  are  expected  in  1986.  A  second  new 
architecture  recently  devised  by  us  [2],  uses  a  multi-channel  AO  cell,  multiple  input 
point  modulators  (achieved  via  an  AO  ceil)  and  a  linear  detector  CCD  array  (with  a 
single-output  channel).  This  architecture  uses  multi-channel  AO  cells  to  increase 
accuracy  and  processing  capacity,  but  does  not  significantly  increase  the  input/output 
and  electronic  support  required.  We  have  assembled  the  initial  laboratory  electronic 
support  system  for  the  first  architecture  [l]  and  are  presently  revising  this  system  for 
our  new  second  architecture.  This  is  a  major  1986  task  item  included  in  our  proposed 
work.  This  is  a  quite  significant  effort  in  itself,  but  it  is  necessary  and  vital  to  obtain 
quantitative  data  and  to  demonstrate  a  real-time  optical  linear  algebra  laboratory 
system  capable  of  reasonable  problem  solutions,  rather  than  initial  demonstrations  only. 
We  expect  to  learn  much  concerning  future  research  direction  from  these  initial  lab 


tests. 


2.1.2  Algorithms 


Parallel  algorithms  are  necessary  for  parallel  processors,  such  as  our  optical 
systems.  We  have  devised  a  new  LU  decomposition  algorithm  suitable  for  our  second 
architecture  [2].  This  algorithm  is  unique  since  it  also  marries  the  concepts  of 
algorithms  and  architectures,  i.e.  the  algotecture  concept  advanced  earlier. 
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2.1.3  Number  Representations 

This  fertile  research  area  refers  to  the  fact  that  optical  processors  should  not 
necessarily  use  conventional  number  representations  employed  in  digital  processors. 
Our  work  in  this  area  has  been  most  fruitful  and  has  resulted  in  a  unique  negative  base 
number  representation  [3]  that  we  recently  devised.  This  number  representation  is  most 
suitable  for  optical  processors  using  multi-channel  and  digital  or  multi-level  DMAC 
algorithms  for  improved  accuracy. 

2.1.4  Case  Studies 

The  extended  Kalman  filter  (EKF)  was  selected  as  our  major  case  study  to  be 
detailed  and  studied.  Our  work  in  this  area  has  been  most  productive  [4,5]  and  fruitful 
during  this  brief  six-month  period.  We  have  developed  a  new  factorized  EKF 
algorithm.  We  have  married  this  algorithm  with  our  new  second  optical  linear  algebra 
architecture  and  our  new  LU  decomposition  algorithm.  We  have  detailed  its  data  flow 
also.  These  features  are  lacking  in  most  other  algorithms.  We  have  noted  the  increased 
efficiency  and  memory  storage  requirements  that  our  new  algorithm  provides.  Kalman 
filtering  algorithms  require  floating  point  operations  and  performance  in  all  instances. 
We  have  thus  detailed  how  to  achieve  floating  point  accuracy  on  our  optical  processor 
[5].  Initial  tests  [4]  (to  be  more  fully  addressed  and  detailed  in  1986)  indicate  that  this 
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factorized  algorithm  requires  a  significantly  less  accurate  processor  than  does  the 
conventional  Kalman  filtering  algorithm.  We  intend  to  pursue  this  extensively  in  our 
1986  research. 

2.2  TIME- VARYING  AND  SUB-PIXEL  TARGET  DETECTION 

The  optical  processing  of  time-varying  images  and  sub-pixel  targets  are  two  new 
optical  data  processing  techniques  to  which  initial  attention  has  been  given  in  our  work 
and  for  which  initial  results  have  been  obtained  (October  1985,  SDI  presentation  and 
associated  viewgraph  report).  The  concept  we  use  in  this  processing  involves  modeling 
the  background  frame-to-frame  as  correlated  noise  with  specified  correlation  lengths, 
mean  and  variances.  The  correlation  of  two  such  frames  of  background  data  thus 
results  in  a  predicted  correlation  function  shape.  We  employ  exponential  and  parabolic 
models  for  these  data  correlations.  From  several  (9  or  25)  samples  of  the 
crosscorrelation  of  two  successive  frames  of  data,  we  can  fit  the  samples  obtained  from 
this  correlation  plane  to  our  models  and  estimate  the  sub-pixel  shift  between  two  frames 
of  imagery.  We  then  apply  this  sub-pixel  shift  and  associated  interpolation  to  the 
second  image  frame  (this  can  be  performed  optically  and  is  a  1986  task  and  new  optical 
data  processing  technique  and  algorithm).  We  then  difference  the  resultant  two  frames 
of  data  (this  is  also  most  suitable  for  optical  implementation  and  is  included  in  our  1986 
proposed  optical  data  processing  research).  The  result  is  a  frame  with  only  moving 


targets  present.  Full  documentation  of  this  algorithm  and  our  initial  results  obtained 


with  it  is  an  additional  1986  task  item.  Extensions  of  our  initial  results  are  also  most 
promising  and  are  included  as  1986  tasks.  Our  initial  work  and  demonstrations 
represent  the  first  application  of  optical  data  processing  for  time-varying  data  and  sub¬ 
pixel  target  detection  and  the  first  and  only  successful  demonstration  (simulation)  of 
such  a  concept. 

2.3  ADAPTIVE  MATCHED  SPATIAL  FILTERS 

Our  architecture  and  algorithm  for  this  concept  have  been  defined  and  are 
summarized  in  a  recent  conference  paper  [6]  and  in  the  SDI  Washington,  D.C.,  October 
1985  presentation  (and  its  associated  viewgraph  report).  The  objective  of  this  task  is  to 
update  filters  with  new  time-history  input  data.  We  achieve  this  with  a 
computationally-efficient  eigenvector  algorithm.  To  udpate  filters  as  required,  we  use  a 
recursive  algorithm.  We  have  thusfar  formulated  this  concept  and  selected  several 
algorithms  for  use  and  for  demonstrations.  This  represents  a  significant  achievement. 
Details  are  expected  in  1986.  Our  1985  objective  was  primarily  one  of  problem 
definition  and  problem  formulation. 

2.4  CONCLUSIONS  AND  RECOMMENDATIONS 

No  definitive  progress  was  made  on  our  originally  proposed  nonlinear  algorithms 
and  neural  processor  work.  This  was  due  to  the  nature  of  the  contract,  in  which 
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attention  was  given  to  educating  students  to  provide  such  data,  rather  than  employing 
faculty  time  to  provide  new  innovative  research  ideas.  In  1986,  we  propose  to  pursue 
these  two  areas  to  at  least  initial  levels. 


In  1985,  we  have  extensively  analyzed  our  new  optical  matrix-vector  Kalman  filter 
algorithm,  architecture  and  number  representation.  All  aspects  of  this  work  are 
proceeding  quite  well.  Considerable  basic  research  on  architectures,  algorithms,  and 
number  representation  has  evolved  from  this  effort.  This  will  be  pursued  further  in 
1986.  Similar  remarks  apply  to  our  sub-pixel  and  time-varying  target  cases.  Initial  new 
ideas  on  neural  processors  and  revised  approaches  to  conventional  methods  to  address 
such  processors  have  been  considered  (conceptually)  and  will  be  included  as  1986 
proposed  research.  Our  major  goal  is  to  integrate  optical  image  processing  and  optical 
linear  algebra.  This  relates  to  the  use  of  imaging  and  other  sensors  to  initialize  an 
optical  Kalman  filter  for  multi-target  tracking  applications. 
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Abstract 

Kalman  filtering  represents  formidable  linear  nlgebra  computational  requirements  for 

each  new  input  measurement  vector.  An  architecture-motivated  implementation  of  a  discrete- 

time  extended  Kalman  filter  (KF)  algorithm  is  presented.  This  particular  formulation  takes 

advantage  of  the  following  features  of  the  optical  processor  architecture:  the  ability  to  perform 

matrix-vector  operations,  floating-point  capabilities,  and  specially  designed  matrix-vector  L  U 

T 

decomposition  operations.  A  factorised  L  D  L  algorithm  is  used  to  propagate  the  covariance 
matrices  between  sample  times.  The  air-to-air  missile  guidance  problem  is  used  as  a  case 
study  wherein  an  extended  Kalman  filter  (EKF)  is  required  due  to  the  nonlinear  nature  of  the 
measurement  aquations. 

Introduction 

The  pattern  in  the  evolutionary  cycle  of  architectural  and  algorithmic 
development  is  well  established  in  the  digital  processor  world,  and  we  can  now  see  a 
similar  cycle  forming  in  the  optical  processor  world.  Optical  linear  algebra  processors 
(OLAP)  [l]  are  attractive  general-purpose  systems  for  performing  various  linear  algebra 


matrix-vector  operations.  They  are  attractive  by  virtue  of  their  parallelism,  high- 
computational  speeds,  the  ability  to  perform  global  operations,  and  their  small  size, 
weight,  and  power  consumption.  They  are  general-purpose  systems  that  can  solve  a 
broad  range  of  systems  of  linear  algebra  equations  and  perform  a  matrix-vector 
multiplication  in  a  single  step.  Taking  advantage  of  this  capability,  a  novel  method  of 
performing  L  U  decomposition  was  devised  [2,  3].  This  is  an  example  of  an 
architecture-motivated  algoi  thm  implementation. 
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In  Section  4,  we  review  one  OLAP  architecture  and  discuss  how  it  achieves  bipolar 
and  floating-point  data  capabilities.  These  datarhandling  methods  will  be  see  to  be  an 
example  of  an  algorithm-motivated  architecture.  In  Section  4,  we  review  the 
fundamental  KF  and  EKF  requirements  and  establish  our  notation.  The  purpose  of 
this  paper  is  to  provide  an  architecture-motivated  implementation  of  the  EKF  for  use  in 
missile  guidance  applications.  We  seek  an  implementation  of  the  EKF  algorithm  which 
makes  optimum  use  of  OLAP  features,  specifically  matrix-vector  intensive  operations, 
and  the  use  of  the  novel  ^  U  realization.  The  factorized  EKF  algorithm  is  advanced  in 
Section  %.  All  steps  in  the  algorithm  are  detailed  and  therein  the  reader  can  see  how  it 
is  designed  with  attention  to  the  OLAP  architecture. 
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Bierman  and  Thornton  [4]-  [7j  provide  the  motivation  for  the  study  of  factorized 
algorithms  and  single-precision  EKF  processors.  Our  application  and  algorithm  extend 
this  earlier  work.  In  an  EKF,  the  calculation  of  the  covariance  matrices  are  the  major 
on-line  operations  required.  Various  formulations  [4]*  [9]  have  been  advanced  for  this 
problem.  Several  digital  [8,  9]  and  optical  [10,  ll]  KF  and  EKF  processors  have  been 
suggested  and  many  OLAP  architectures  have  been  discussed  [l].  However,  only  one 
paper  [10]  has  detailed  the  realization  of  an  EKF  on  an  optical  processor.  Our  present 
algorithm  offers  improved  numerical  stability  and  reduced  computational  accuracy 
requirements.  The  computations  required  in  an  EKF  are  of  general  use  in  other 
applicr  ons  and  thus  these  results  should  be  of  wide-spread  use  in  areas  besides  highly- 
maneuverable  target  missile  guidance  with  measurements  that  are  nonlinear  in 
Cartesian  coordinates. 

4  High-Accuracy  Optical  Linear  Algebra  Processors 

The  general-purpose  OLAP  used  [3]  is  shown  in  Figure  1.  It  consists  of  a  linear 
array  of  M  point  modulators  stacked  vertically  at  Pj.  These  are  imaged  vertically  and 
expanded  horizontally  onto  P2  which  contains  an  N-channei  acoustooptic  (AO)  cell.  For 
discussion  purposes,  we  consider  M  vertical  regions  of  the  AO  cell  at  P2.  Each  Pt  point 
modulator  uniformly  illuminates  one  vertical  region  covering  ail  N  AO  channels  at  P2. 
Plane  P2  is  imaged  horizontally  and  integrated  vertically  onto  P$.  For  simplicity,  we 


view  Pj  as  a  shift  register  linear  detector  array  (the  exact  P3  system  is  more  complex 
and  is  detailed  elsewhere  [3]).  One  vertical  region  originating  at  Pj,  passing  through  the 
corresponding  vertical  region  of  P2,  and  focused  onto  Pj,  is  called  a  processor  channel ; 
there  are  M  processor  channels  in  the  system  of  Figure  1. 

In  Figure  1  we  show  a  system  with  all  M  processor  channel  outputs  summed  at  P}. 
To  achieve  high-accuracy,  this  system  implements  a  multiplication-by-convolution 
algorithm  [12,  13,  14]  which  we  now  briefly  review.  In  this  algorithm,  to  multiply  two 
numbers,  their  encoded  bit  streams  are  convolved  (*his  yields  a  mixed-radix  product) 
and  the  result  is  converted  to  the  original  encoded  representation  (by  a  simple  shift/add 
procedure).  To  demonstrate,  we  multiply  two  binary-encoded  numbers  on  the  system  of 
Figure  1,  by  considering  a  single-channel  version  of  Figure  1,  i.e.,  M=»l.  The  bits  of  the 
first  number,  Sj,  are  fed  serially  to  Pj  and  the  bits  of  the  second  number,  s2,  are  fed 
word-parallel  to  P2.  For  N-bit  words,  a  new  bit  enters  Pj  each  Tj  seconds  and  a  new 
word  enters  P2  each  T2  seconds,  where  T2  ■»  NTj.  The  data  incident  on  P3  each  Tx 
are  either  s2  or  zero  (depending  on  the  input  bit  at  PJ.  The  contents  of  P3  are  shifted 
by  one  digit  each  Tj  (our  Ps  detector  system  achieves  this).  At  the  next  Tlt  the 
incident  data  (again  either  s2  or  zero)  is  added  to  the  shifted  output  data  produced  at 
the  prior  Tj.  One  new  mixed-binary  output  bit  is  thus  produced  each  Tt  (i.e.,  each 
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time  the  contents  of  P3  are  shifted)  and  the  full  product  is  available  after  T2  seconds, 

and  the  1-D  output  (in  time)  from  P3  is  the  convolution  of  the  bits  of  Sj  and  s2,  i.e.,  the 

mixed-radix  product  SjSj.  Thus,  the  P3  output  sm3(t)  from  the  m-th  channel  is  the 

convolution  (denoted  by  *)  of  the  data  sequence  sml(t)  fed  to  the  m-th  point  modulator 

at  Pt  and  the  data  sm2(x)  present  in  the  m-th  region  of  P2;  i.e., 

sn»3^kT2^  “  sml  *  sm2  ^ 

where  k  is  the  discrete  time  index.  This  algorithm  is  quite  versatile,  since  the  signal 
sequences  sml  and  sm2  can  be  multi-level  or  binary  encoded  representations  of  the  two 
numbers  [3].  In  either  case,  the  product  in  (1)  is  in  mixed-radix  representation.  The 
outputs  are  easily  converted  to  the  original  encoding  representation  by  a  single  adder 
and  A/D  converter  on  the  one  P3  output  line  [3]. 

Next,  we  consider  all  M  processor  channels  of  the  system  and  discuss  how  the 
system  forms  the  vector  inner  product  (VIP)  a^  b  of  two  M-element  vectors.  The  M 
elements  of  vector  a  are  fed  in  parallel  to  P1  with  each  ?!  point  modulator  fed  serially 
(in  time)  with  the  encoded  representation  of  each  element  as  discussed  before.  Each 
vertical  region  of  P2  (there  are  M  such  regions)  contains  the  encoded  representation  of 
one  element  of  the  M-vector  b,  with  the  N  bits  of  that  element  placed  horizontally 
across  P2,  each  bit  in  a  different  AO  channel.  Each  processor  channel  performs  a  high- 
accuracy  scalar  product,  and  all  M  channels  operate  simultaneously.  The  output,  P3,  is 
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the  sum  of  these  M  different  products  formed  on  the  M  channels,  or  equivalently,  the 
VIP  a^  b.  The  number  of  bits  of  accuracy  required  in  an  application  determines  the 
number  of  AO  channels  N  needed  at  P2,  the  rate  1/T1  at  which  encoded  vector  data 
need  to  be  fed  to  Pj,  and  the  specifications  of  the  P3  adder  and  A/D  converter.  A 
reduced  accuracy  requirement  can  significantly  increase  speed  (data  input  rate  at  PJ 
and  reduce  cost  (the  number  of  P2  channels).  In  our  EKF  application,  the  bit  accuracy 
required  as  a  function  of  the  algorithm  used  will  be  considered.  When  multi-level 
encoding  is  used,  N  and  1/Tj  can  be  reduced  (saving  cost)  or  1/T1  can  be  increased 
(improving  speed).  The  computation  of  one  M-element  VIP  (M  multiplications  and 
(M-l)  additions)  accurate  to  32  bits  can  be  achieved  every  0.1  p sec  using  N=32 
channels  (using  binary  encoding)  or  with  only  N«*5  channels  (using  base  4  encoding). 
With  M— *10  (a  modest  number,  and  currently  in  use  in  our  prototype),  this  system 
achieves  about  20  operations/(0.1  *isec)  »  200  MOPs.  The  system  in  Figure  1  has  other 
attractive  features:  It  allows  easy  partitioning  of  a  given  problem  and  only  one 
processor  channel  of  the  system  is  required  to  realize  a  unique  implementation  of  L  U 
decomposition  (3). 

The  system  also  allows  for  the  processing  of  floating-point  data.  In  this  case,  the 
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product  mantissa  is  calculated  optically  and  the  exponent  is  handled  by  dedicated 


external  circuitry.  Each  vector  element  is  expressed  as  a  normalized  mantissa  and  an 
exponent  as  in  the  conventional  floating-point  representation.  For  an  M-element  VIP, 
the  largest  of  the  M  individual  product  exponents  is  found  (its  use  is  explained 
subsequently).  Each  multiplicand  is  then  appropriately  delayed  (see  below)  and  fed  to 
the  associated  Pj  channel. 

To  demonstrate  these  points,  we  consider  a  specific  example.  Let, 

(»)io  “  ( Wi)/r**P»  where 

•  (a)1Q  is  the  vector  element  in  base  10  notation, 

•  r  is  the  base  in  the  encoded  representation  of  a, 

•  pft  is  the  exponent  of  the  normalized  base  r  representation  of  (a)10, 

•  (a3.a2a1)r  is  the  base  r  normalized  mantissa  of  (a)1Q,  and 

•  a  j  is  a  single  base  r  bit  of  the  normalized  mantissa. 

Note:  by  "normalized”  we  mean  that  a^  is  the  only  non- zero  bit  to  the  left  of  the 

decimal  point  and  ag  must  be  non-zero  (except  for  (a)10»0).  As  our  example,  we 

consider  the  binary  representation  (i.e.,  r»2)  of  five  numbers: 

(1.0)10  -  (1.00),*2**0 

(1.5) ,,  -  (1.10), •2**0 
(2.0)|0  -  (l.00),*2**l 

(2.5) ,0  -  (l.01),*2**l 

(3.5) ,0  -  (1.11), *2**1 

We  use  these  representations  end  consider  the  VIP 

[  3.5  1.0  2.5  1  (  2.0  1.5  1.0  1T  -  11.0. 


By  simple  addition,  the  individual  product  exponents  are  found  to  be  2,  0,  and  1. 

respectively,  from  which  the  largest  exponent  is  PmMX  =  2,  corresponding  to  the  product 

of  3.5  and  2.0.  The  mantissas  of  the  second  vector  (the  multipliers)  are  loaded  directly 

into  P2  (see  Table  1).  The  mantissas  of  the  first  vector  elements  (the  multiplicands)  are 

loaded  into  P1  with  the  mantissa  of  vector  element  t  delayed  by  4jTj  where  ^  is  the 

difference  between  the  largest  product  exponent,  Pmi3t,  and  the  t-th  individual  product 

exponent  and  4-  is  the  largest  such  difference,  (see  Tables  2). 

‘max 


The  encoded  and  shifted  mantissas  are  multiplied  and  summed,  and  the  VIP 

result  in  mixed-radix  form  is  obtained  from  P3  as  a  function  of  time  as: 

Output  Time  —  7Tj  6Tj  5Tj  4Tj  3Tj  2T,  IT, 
Output  Digit  a  1222000 


This  mixed  binary  representation  is  converted  to  the  normalized  VIP  mantissa  value  as 
1*2°  +  2*2*1  -I-  2*2*2  +  2*2’3  +  0*2^  +  O^5  +  0*2^  —  2.75. 


The  normalized  mantissa  2.75  is  then  multiplied  by  r**4j  =22  to  yield  the  VIP  result 

‘max 

11.0.  In  this  system,  the  time  needed  to  compute  a  complete  floating-point  VIP  product 

I 

mantissa  is  increased  from  NT.  by  4-  T-  (the  time  by  which  the  multiplicand 

1  'max  1 

mantissa  associated  with  the  smallest  individual  product  must  be  delayed).  It  should  be 


noted  that  the  digital  realization  of  floating  point  operations  requires  a  similar  increase 
in  time.  The  optical  system  realization  has  the  advantages  of  parallelism,  global 
operations,  the  use  of  non-binary  encoding,  etc. 


s? 


J: 

A 


» 
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3 <4  Kalman  Filter  and  Extended  Kalman  Filter  Review 

The  objective  of  a  KF  and  an  EKF  (15]-  (18]  is  to  estimate  optimally  (in  the 
least- mean-squares  sense)  the  state  vector  x(t)  of  a  dynamic  system.  We  briefly  detail 
the  computational  steps  in  a  KF  and  an  EKF  in  this  section.  Table  3  lists  the  notation 
and  the  dimensions  of  all  quantities.  The  dynamics  of  the  tracked  system  are  modeled 


by: 

x(t)  —  A  x(t)  +  C  u{t)  +  w^t).  (2) 

The  measurements  z(t)  and  state  vector  x(t)  are  related  by: 

z(t)  «■  H  x(t)  +  v(t).  (3) 

In  all  algebraic  processors  (optical  or  digital),  new  data  are  presented  at  discrete-time 

instants.  We  thus  discretize  the  system  dynamics  model  in  (2)  at  the  outset.  By 

applying  the  standard  exact  discretization  algorithm  [16,  10,  20],  the  discrete-time 


system  dynamics  model  and  measurement  model  in  (2)  and  (3)  become: 


+  £jJk  +  2* 


Ik  “  HSk  +  2k 


(4) 

(5) 


where  kT  are  the  sampling  times,  T  is  the  sampling  period,  expf  AT), 

T 

j  expf  At)  C  dt,  and  w^  and  v*  4fe  independent  zero-mean  white  Gaussian 
o 

noise  vectors.  The  measurement  noise  v.  is  v(t)  evaluated  at  the  sample  time  t=kT. 


We  refer  the  reader  to  Reference  (16]  for  the  relationship  between  w(t)  and  w 
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The  KF  is  an  observer  which  takes  into  account  the  additive  noise  disturbances 


and  v^,  and  produces  the  least-mean-square  estimate  xk  of  the  state  vector  xk 
from  the  new  measurements  z^,  and  the  a  priori  estimate  x^  calculated  at  time  k-1 
(before  the  present  measurement  was  available).  The  discrete-time  KF  equations  are 
listed  in  Table  4. 

We  notice  that  the  second-order  error  covariance  matrix  calculations  for  P^*1  and 
M^.  in  (10)  and  (13),  respectively,  are  independent  of  the  state  estimates  £k  and  x^. 
Hence,  from  an  initial  error  covariance  matrix  or  at  k-*0  and  the  noise 
statistics  ^  and  for  all  time  k,  we  can  precompute  and  store  the  KF  gain  matrix 
Kv  for  all  time  k.  We  need,  therefore,  to  implement  only  (12)  and  (14)  in  real-time  to 
compute  x  k  and  x^,  and  we  never  need  to  compute  the  second-order  state  estimate 
statistics  and  in  real-time.  This  luxury  of  precomputability  applies  only  for  the 
case  of  linear  system  and  measurement  models  and  knoum  noise  statistics. 

The  EKF  algorithm  is  compiled  in  Table  5.  An  EKF,  rather  than  a  KF,  is  used 
when  either  the  system  model  or  the  measurement  model  is  nonlinear.  In  a  missile 
tracking  application,  the  state  vector  x^  is  in  Cartesian  coordinates  whereas  the 
measurement  vector  z^  is  in  polar  coordinates  and  is  thus  nonlinear;  i.e.,  the 
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measurements  are  bearing  angles  and  range,  and  these  are  nonlinearly  related  to  the 
Cartesian  coordinates  of  position,  velocity,  and  acceleration.  The  coordinate 
transformation  vector  h(  xj  in  (15)  and  (18)  represents  this  conversion.  In  an  EKF, 
the  linearized  transformation  matrix  H[  x^]  in  the  P^'1  and  equations  (18)  and  (17) 
now  depends  on  the  a  priori  estimate  x^.  We  thus  cannot  precompute  P^*1,  and 
M^j.  Since  we  cannot  precompute  K^,  all  of  the  calculations  in  (13),  (14),  and 
(18)-(18)  must  be  performed  on-line  during  one  sample  period  T  for  each  new 
measurement.  There  is  flexibility  in  the  ordering  of  the  operations  in  Table  5.  For 
data  flow  and  computation  time  efficiency,  we  implement  Table  5  in  the  order  (16), 
(17),  (18),  (14),  and  (13)  for  each  time  index  k. 

^/  #F*ctopiaed  EKF  Algorithm 

Bierman  and  Thornton  (4]-  (7)  provided  comparative  studies  on  the  numerical 
accuracy  and  efficiency  of  KF  formulations  using  different  factorized  implementation 
They  found  the  factorized  versions  to  be  more  numerically  stable  (i.e.,  the  covariance 
matrices  and  P^  always  remained  positive  definite)  and  to  require  less 

computational  precision.  The  basic  concept  in  such  an  implementation  is  to  factor 
and  P^  and  to  update  only  their  factors  at  each  time  step  k.  Our  factorized  algorithm 
is  unique  in  four  ways: 

1.  We  designed  all  steps  to  use  matrix-vector  operations  rather  than  scalar 
operations. 
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2.  We  eliminate  the  need  for  most  full  matrix- matrix  operations  and  thus 
simplify  the  processor  and  the  algorithm  to  be  implemented. 

3.  We  implement  an  L  D  LT  rather  than  an  L  L^*  factorization  or  Householder 
factorizations  as  used  by  others  [21,  22).  L  LT  factorizations  were  not  used 
because  they  require  a  square-root  operation  which  is  not  attractive. 
Householder  factorizations  were  not  employed  because  they  are  more 
complicated  to  achieve  on  a  VIP  processor.  An  L  D  LT  factorization  was 
selected  because  it  is  straightforward  to  implement  using  only  one  channel  of 
the  system  in  Figure  1.  Bierman  and  Thornton  apply  the  Given’s  algorithm, 
whereas  our  implementation  of  the  LU  decomposition  algorithm  [3],  applied 
to  the  D  LT  decomposition,  operates  on  one  row  and  column  of  the  matrix 
in  parallel,  thus  making  more  efficient  use  of  our  optical  processor’s  parallel 
capabilities. 

4.  Bierman  and  Thornton  compared  the  numerical  accuracies  of  double¬ 
precision  and  single-precision  computations.  However,  they  computed  all 
vector  inner  products  using  double-precision  arithmetic  and  then  rounded 
the  results  to  single-precision,  and  they  stored  state  estimates  in  double- 
precision.  In  contrast,  we  perform  all  computations  using  single-precision 
(floating-point)  arithmetic. 

In  Table  0,  we  outline  the  algorithmic  development  and  computational  steps  to 
obtain  the  factors  of  in  (13).  The  method  we  use  is  to  evaluate  (13)  as 

Mfc  4 1  ^  $T  +  qA  b.  b^  4-  ...  +  qQ  b  b^,  where  q^  and  b.  are  defined  in  Table  6. 


This  evaluation  is  performed  recursively  by  adding  one  of  the  vector  outer  products 


(VOPs)  qj  bj  b?  at  a  time  as: 


a 


+1 


Gj  +  bj  bT  for  i**  l,2,...,n, 


(6) 


where  i  is  the  iteration  index,  and  not  the  time  index  k,  and  #T.  We  seek 

the  LDL  factors  of  Mj^,  i.e.,  the  factors  of  G .+1  in  (6)  at  i*n,  and  not  the  full 


Mk+1  or  matrices.  We  thus  write  Gj 


Lgj  2g.  Isg.  an<^ 


G.  ,  mt  L_  D-  Li  .  (The  factors  and  are  defined  in  Table  6.)  The 


recursion  in  (6)  can  thus  be  written  as 

Si+1  -  Si  +  laOUtfJIo,  “  ho,  s,  k£. 


P) 


where  d.  is  the  solution  of  d.  —  b.  and  S.  ■»  D«  +  q.  d.  dT  ■■  Lc  Dc  lJ.  We 

-I  — -»  -t  -»  — G;  T  “*  -*  -5j  — Sj  “Sj 


implement  our  algorithm  by  the  recursive  computation  in  (7).  These  steps  are 

performed  as  follows:  Lq  is  available  from  the  previous  recursion  i-1,  and  S  is 

i  -1 

computed  at  each  recursion  i.  The  LDL^  factors  of  are  then  computed  and  the 
new  Lp  and  DU  factors  for  G. , .  are  then  easily  obtained  (updated)  as 
*  Lr  !«  and  D«  =■  Dc .  Since  each  matrix  in  (7)  is  described  by  its 

-Mi+1  -Oj  ^  -»i 

LSL  factors,  updating  the  factors  of  G.  at  each  iteration  is  easily  achieved. 


We  repeat  (7)  until  i*-n  (the  number  of  estimated  states)  and  thus  obtain  the 
desired  L  D  L T  factors  of  Mk+1  «  ,n  (13).  The  programming  for  the  LDLT 
algorithm  to  compute  and  for  is  presented  in  the  Computation  section 
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of  Table  6.  Steps  1-3  compute  Lq  and  Dq  for  Gj  and  are  thus  the  initialization 
steps.  Steps  5-10  compute  Lq  and  for  one  iteration  i.  These  steps  are  repeated 
until  i—n  with  and  0^  updated  at  each  i  as  in  Steps  0  and  10.  Ail  occupy  the 
same  physical  memory  location,  and  similarly  for  all  D^.  Our  algorithm  thus  makes 
efficient  use  of  memory  storage.  As  shown,  all  operations,  except  those  involving  S,  use 
only  diagonal  and  triangular  matrices  (thus  simplifying  the  processor,  algorithm,  and 
data  flow).  The  S  matrix  is  full,  but  computation  of  its  Lg  and  Dg  factors  is  quite 
simple  using  a  Cholesky  decomposition  modification  of  our  L  U  algorithm  [2,  3]  which 
is  especially  easy  to  realize  on  only  one  channel  of  the  system  in  Figure  1.  The 
algorithm’s  steps,  operations,  data  flow  and  memory  requirements  are  thus  most 
attractive. 

In  order  to  calculate  the  a  priori  covariance  described  above,  we  need  the 

factors  Lp  and  Dp  of  the  a  posteriori  covariance  P^.  We  now  consider  that  portion  of 
the  EKF  algorithm  which  computes  these  Iq>  and  Dp  factors.  Table  7  specifies  our 
notation,  algorithmic  development,  and  the  on-line  update  program.  These  steps 
parallel  those  in  Table  8.  Equation  (18)  gives  an  expression  for  P^*1,  whereas  we  desire 
instead  the  factors  of  P^.  Thus,  we  apply  the  matrix  inversion  lemma  of  (20)  (see 
Table  7)  and  obtain  the  recursive  formula  in  (21)  to  compute  P.  by  summing  r  VOPs, 


where  r  is  the  number  of  measurements.  As  before,  this  requires  that  we  process  only 
one  VOP  at  a  time.  The  recursions  are  performed  as  in  (22),  with  the  new  Lq  and  Dq 
computed  as  in  (23).  After  r  iterations,  the  factors  Lp  and  Dp  (which  we  desire)  of  Pfc 
are  obtained. 

The  calculation  of  the  P^  factors  (Table  7)  and  the  MJt+1  factors  (Table  6)  are 
the  most  computationally  intensive  steps  in  the  EKF  algorithm  (Table  5).  Once  the  P^ 
factors  have  been  obtained,  we  compute  the  state  estimate  measurement-update  (18)  by 
first  substituting  the  expression  for  the  gain  matrix  (17)  into  (18)  and  performing  the 
computations  in  the  resultant  equation  from  right  to  left  (thereby  requiring  only  matrix- 
vector  operations).  Computation  of  (14)  is  straightforward.  We  then  compute  (13)  as 
in  Table  6.  This  completes  all  of  the  EKF  steps  of  Table  5.  To  summarize,  these 
equations  are  implemented  in  the  order  of:  a  form  of  (16),  a  combination  of  (17)  and 
(18),  (14),  and  (13). 

The  key  computational  steps  in  Tables  6  and  V,  are  vector  and  not  scalar 
operations.  Such  detail,  formulation  and  development  are  required  for  data  flow 
analysis  and  for  evaluation  of  the  computation  time  required.  These  aspects  are  unique 
in  this  work  where  we  consider  both  the  algorithm  and  the  processor  architecture  to  be 


used  fop  implementation.  We  note  that  (  £  D),  and  all  L  matrices  are  unit  diagonal 
lower  triangular  matrices,  and  P  and  all  D  matrices  are  diagonal.  All  operations 
(except  Steps  7  and  8  in  Table  6,  and  Steps  8  and  0  in  Table  7)  involve  only  triangular 
and  diagonal  matrices  and  are  thus  easily  achieved  with  high  accuracy  and  efficient 
data  flow.  The  Lg  and  Dg  decompositions  required  (  Step  8  in  Table  8  and  Step  9  in 
Table  7)  involve  the  full  matrix  S,  but  are  easily  calculated  [2,  3]  using  only  one 
channel  of  the  system  in  Figure  1.  The  matrices  ( 2*1)2,  £,  £,  Lq,  D^,  and  R  are 
precomputed  off-line  once  and  stored  in  ROM.  The  data  flow  and  computational 
sequence  are  such  that  all  L^,  Lp,  and  Lq^  matrices  can  occupy  the  same  physical 
storage  location  (and  similarly  for  all  D^.,  Dp,  and  2m  matrices).  The  operations 
required  in  each  time  step  k  in  the  entire  EKF  can  be  performed  in  1  msec  on  the  VIP 
processor  of  Figure  1.  In  a  subsequent  publication,  we  will  detail  the  performance  of 
this  EKF  algorithm  for  several  case  studies. 

5  4  Summary  and  Conclusions 

A  high-accuracy  floating-point  optical  linear  algebra  processor  has  been  discussed 
and  its  use  in  EKF  processing  detailed.  A  new  factorized  EKF  algorithm  was  advanced 
and  detailed.  Ail  steps  in  this  algorithm  were  formulated  as  vector  intensive  operations, 
with  attention  to  data  flow,  the  hardware  processor  and  the  algorithm.  All  steps 
required  in  one  cycle  of  an  EKF  can  be  performed  in  1  msec  on  the  system  described, 
thus  allowing  a  new  measurement  every  msec. 
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Figure  1:  Simplified  Schematic  of  a  High-Accuracy  Vector  Inner  Product 
Processor  (N-bit  Accuracy,  M-element  Vector)  [3j. 
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Table  1:  Timed  Inputs  to  the  AO  Cell  Channels  at  P2 


AO  Channel  Content* 


Time 

Multiolier 

*3 

±2 

Si 

1T2 

3.5 

1 

1 

1 

2T, 

1.0 

1 

0 

0 

«. 

2.5 

1 

0 

1 

Table  2:  Timed  Inputs  to  Point  Modulators  at  P1 

Point  Multiplicand  Delay  Input  Time  Slots  ~ 

Modulator  Mantissa  A  5T,  4T.  3T,  2T,  IT, 

— t  “I  ““l  ““1  —1 

top  2.0  2  1  0  0  0d  0^ 

middle  1.5  0  0  0  1  1  0 

.  P  P 

bottom  1.0  1  0  1  0  0  0, 

P  a 

where  0^  are  delay  seraa  and  0p  are  padding  seroa 


Table  3:  Definition  of  Symbols  Used  in  Discrete-Time  Linear  and 

Extended  Kalman  Filter  Algorithms 


Dimension 


Description 


expectation  operator 
input  distribution  matrix 
nonlinear  coordinate  transformation  vector 
which  transforms  the  Cartesian  coordinates  of 

to  the  polar  coordinate  frame  of  the  measurement  vector  i 
linear  coordinate  transformation  matrix  (in  KF) 
linearised  measurement  matrix  (in  EKF): 

H[  *]  — 1  8  h[  XjJ/B  Xjj  evaluated  at 
discrete- time  index 
Kalman  gain  matrix 
a  priori  error  covariance  matrix 
number  of  inputs 
number  of  states 

a  posteriori  error  covariance  matrix 
system  matrix 

system  driving  noise  covariance  matrix  of  w^ 

number  of  measurements 

measurement  noise  covariance  matrix  of  v^ 

data  sampling  period 

input  (control)  vector 

measurement  noise  vector 

system  noise  vector 

system  state  vector 

a  priori  state  vector  estimate 

a  posteriori  state  vector  estimate 
measurement  vector 


Table  4:  System  Model  and  Covariance  Kalman  Filter  Algorithm 


Linear  System  Model 

(8) 

Linear  Measurement  Model 

*k  —  ^ 

(9) 

Error  Covariance  Measurement-Update 

Ek 1 "  Mk 1  +  ST  Ek'1  E 

(10) 

Gain  Matrix 

Ek-  Ek  ET  Ek'1 

(11) 

State  Estimate  Measurement-Update 

£k  -  2Sk  "*■  Kk(  2k  ‘  M 

(12) 

Error  Covariance  Time-Update 

Mk+1-  *_EkfT+  2k 

(13) 

State  Estimate  Time-Update 

ik+l“  *5k+  —  l^k 

(H) 

Table  5:  System  Model  and  Covariance  Extended  Kalman  Filter  Algorithm 


Linear  System  Model 

2k+l~  ££k+  £«k  +  ^ 

(8) 

Nonlinear  Measurement  Model 

*k  ”  U  Ski  +  ik 

(15) 

Error  Covariance  Measurement-Update 

Ek'1  -  Mk*1  +  HT[  Xfc]  R^'1  H( 

(16) 

Gain  Matrix 

Kk-  Ek^fikiSk*1 

(17) 

State  Estimate  Measurement-Update 

£k“  2k+  ^k^  2k* 

(18) 

Error  Covariance  Time-Update 

Mk+i  “  4E*tT  +  2k 

(13) 

State  Estimate  Time-Update 

ik+i-  £ik+  £Sk 

(14) 

Table  8s  Computation  of  Mk+l 


Development 


•  Mk+i  —  1 1*  £ 1  +  2  -  £  Ek  £ 1  +  Lq  2q 

•  Coivmn-w$e  partition  Lq  m  Lq  “  [  bj  ...  bj  where  bj  ia  an  n-veetor. 

•  Write  Dq  a>  Dq  —  diagj  ...  qQ]  where  each  q^  is  a  scalar. 

•  Mk+i  “  £  £k  £T  +  qi  +  -  +  %  K  fej- 

•  Let  G  -  *  ^  *T  -  (  *  Lp  0  2.  ‘l  2p  2  *TX  £  Lp  &T  -  Lq  Dq  lJ  ,  where  D  is  a 

diagonal  matrix  whose  non-sero  elements  are  the  reciprocals  of  the  corresponding  diagonal 
elements  of  <P. 


•Mk+l 


e  In  recursive  form, 


a  + 


q.  b.  ^ 


K  §i  ^ 


We  first  form 


Sj  —  Dq  +  qj  dj  dj1  (where  Lq  dj  —  bj),  and  then  decompose  Sj  as  Sj  «■<  Lg  Dg  Lg  . 

Thus,  L*  — ■  L,-.  Lc  and  D-,  —  Dc . 

“^i+l  “^i^i  ^i+l  -Si 

s  The  following  are  unit  diagonal  lower  triangular  matrices:  Lg,  (♦£)(£  by  itself  is  lower 
triangular  but  not  unit  diagonal),  Lp,  Tmnl  (in  Step  1  below),  Lg,  and  L^|. 

•  The  following  are  diagonal  matrices:  Dq,  0,  Dq,  Dp,  Dg,  and  D^. 
e  The  following  are  full  matrices:  S,  Tmol  (in  Steps  6  and  7  below). 


Computation 


I.  Tmpl  «■  £  Lp 


2Lq 


Tmpl  D 


4.  i  -  1 

5.  Back  substitute  d-  —  b-  for  d-. 

- Ci  11-1 

6.  Tmpl  ♦*  e  d-  d? 

7.  S  *  Dq  +  Tmpl 

8.  Compute  the  Cholesky  factors  Lg  and  Dg  of  S. 

••2q*2s 

10.  Lq  «■  Lq  Ijjg 

11.  If  i—n  then  *  Lq  and  D^|  *•  Dq;  else  i  ♦  i  +  1  and  go  to  Step  5. 
2.  S 


Table  7:  Computation  of  P 


Development 


- 1 '  a 

•  Coiumn-wite  partition  H  (  J  u  [  h^  ...  h^,  ]  where  is  an  n-vector. 

•  Let  R,;1  -  diag[  Z\l  ...  *;1  ]. 

•  Rewrite  (16)  aa  P^"1  <—  h ^  bj  +  ...  +  £  j1  b,.  hj 

•  We  will  proceaa  one  vector  outer  product  at  a  time,  therefore  we  use  the  notation 

Slit  -  2il  +  k  £ 

—l+l  —i  l  — »  — t 

where  G^1  —  and  P^"1  -■  G~ ^ ^ .  (Subscript*  k  and  i  are  time  and  iteration  indices, 

respectively.) 

•  Take  the  inverse  of  both  sides  of  (19)  and  apply  the  matrix  inversion  lemma 

(A+  xcfr1  -  A-1-  A*1X(C-1+  Y ^TA*lX)*1YTA-1 
with  the  substitutions  A  —  Gjl,  X  ■  Y  —  h-,  and  C  —  RJl  to  obtain 
—  G.  +  of  G-  h-lf  G.  h.|T 

—1+1  —1  ‘  —1  — t  “I* 

T 

where  the  scalar  a  satisfies  1/a  ™  *(J2j  +  b  Gj  b^). 

•  Let  Gj  —  Lq  Lq  and  —  Dq  Lq  h^.  Then  (21)  becomes 

Ea.^,  tl,  -  to.l  Be. +  “  ii  $  1  ti-  tG.  5i  to. 

1+1  1+1  1+1  1  1  i  1 

T  T 

If  S-  —  Lg  Dg  Lg  is  the  factored  form  of  the  matrix  S-  —  Du  +  a  d,  d-  ,  then 

i  i  i  “*  “**i  “* 

L<;.  —  Lq  Lg  and  Dq  —  Dg.. 


Computation 

1.  i  *  1 

2  Lq  -  Lm  and  Dq  -  Dm 

3.  Tmpl  -  h- 

4.  ii  *  2q  Tmpl  ~  2  !sg  &i 

5.  Tmpl  *•  fTmpll*^  dj 

6.  a  *  -l/(  Tmpl  +  Z .) 

T 

7.  Tmpl  ♦*  «  d^  d| 

8.  S  «■  Dq  +  Tmpl 

9.  Compute  the  Choltsky  factors  Lg  and  Dg  of  S. 

10.  [Jq  *■  Dg  (  Dg  is  the  appropriate  component  of  the  left  side  of  (21)  at  Step  i+1) 

11.  •>  Lq  Lg 

12.  If  i—r,  then  Lp  *  L^  and  Dp  «■  Dq;  els*  i  «■  i+1  and  go  to  Step  3. 
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ABSTRACT 


This  report  will  cover  three  independent  developments  in  the  field  of  com¬ 
puting.  Two  of  these  developments  are  aimed  at  achieving  extremeely  high  accu¬ 
racy  results  with  only  moderately  accurate  components.  This  appears  to  be  a  new 
direction  in  optical  computing.  The  third  approach  is  also  a  total  new  direc¬ 
tion  in  optical  computing.  It  involves  a  new  type  of  optical  device  called  an 
Optical  fredkin  Gate.  Optical  Fredkin  gates  appear  to  have  virtually  an  unlim¬ 
ited  number  of  functions.  Among  those  logical  operations,  memory,  and  intercon- 
nec.  They  have  the  additional  virtue  of  being,  in  principle,  capable  of 
extremely  fast  switching.  We  will  devote  one  section  each  to  those  three  development 
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I.  TECHNICAL  SUMMARY 


1 .  OBJECTIVES 

The  purpose  of  this  work  was  to  devise  new  approaches  to  optical  com¬ 
puting  that  could  have  the  affect  of  creating  new  opportunities  for  the  use  of 
optics  in  SDI .  Three  new  approaches  were  investigated  and  all  appear  worthy  of 
additional  study.  First,  we  investigated  the  possibility  of  doing  numerical 
algebra  to  high  accuracy  with  relatively  low  accuracy  but  fast  and  inexpensive 
analog  optical  processors.  This  required  bootstrapping  by  doing  a  few  rela¬ 
tively  simple  operations  with  electronic  digital  processors.  Both  theory  and 
simulation  were  studied.  Convergence  to  highly  solutions  appears  to  be  possible 
in  a  great  many  cases.  Second,  we  thought  of  ways  using  relatively  low  accuracy 
optical  technologies  to  do  a  variety  of  neighborhood  processors  and,  in  particu¬ 
lar  cellular  array  processors,  that  is  rather  arbritrary  and  programmable  non 
linear  operations  on  images  were  studied.  The  methods  devised  appear  to  be  able 
to  work  at  extremely  high  speeds  and  over  extremely  large  neighborhoods.  Third, 
we  devised  a  totally  new  type  of  optical  computing  component-the  Optical  Fredkin 
Gate.  Optical  Fredkin  Gates  appear  to  be  capable  of  performing  any  logical 
operations,  performing  any  delay  or  memory  operation  on  a  sequence  of  bits,  or 
doing  any  interconnect.  They  are,  therefore,  perhaps  the  most  general  optical 
computer  component  devised.  In  some  configurations,  optical  fredkin  Gates  have 
the  potentiality  of  operating  at  the  source  bandwidth  with  limit  ( 1 0 ^ 2  "  1214H). 

2.  Description  of  Work  Performed 
(a)  Optical  Fredkin  Gates 

One  of  the  limitations  imposed  on  increasing  computation  power,  be 
it  electronic  or  optic,  stems  from  the  large  amount  of  energy  that  needs  to  be 
dissipated  during  computer  operation!.  Part  of  this  energy  is  due  to  the  intrin 


sic  nature  of  the  traditional  logic  elements.  This  fact  becomes  evident  if  we 
recall  that  conventional  logic  gate  has  more  input  lines  than  output  lines. 

Thus  some  of  the  information  coming  into  the  gate  is  lost  and  cannot  be 
retrieved.  The  irreversible  nature  of  the  gate  makes  it  dissapative  not  only  in 
information  but  also  in  energy.  In  an  effort  to  overcome  these  limitations. 

Fredkin  2  proposed  a  new  kind  of  logic  gate  which  has  the  same  number  of  output 
lines  as  it  has  input  lines.  Fredkin  gates  are  capable  of  performing  conven¬ 
tional  logic  operations  while  preserving  all  the  original  information.  In  con¬ 
trast  to  the  conventional  logic  gates  the  fredkin  gates  may,  in  principle,  be 
run  backwards  to  regenerate  the  original  input  signals. 

The  purpose  of  this  work  is  to  introduce  the  Optical  Fredkin  Gate 
which  may  become  one  of  the  basic  building  blocks  of  an  optical  computer.  An 
ove.  /iew  of  the  main  aspects  of  the  Fredkin  Gate  is  given  in  the  next  section, 
followed  by  a  variety  of  proposed  optical  implementations.  A  number  of  useful 
applications  are  discussed  in  a  final  section. 

Background  of  the  Fredkin  Gates:  The  basic  Fredkin  Gate  is  defined 
as  a  black  box  having  three  binary  inputs  and  three  binary  outputs  (Fig.  1). 
the  C-input  -  the  control  line,  determines  the  operations  of  the  gate  on  the 
other  two  inputs  according  the  following  rules: 

IF  C  =  0:  A’  =  A;  B’  =  B:  (1) 

IF  C  =  1 :  A '  =  B ;  B '  =  A ; 

It  is  quite  evident  that  this  gate  is  reversible,  i.  e.  it  may  be 
run  backward  to  return  to  the  original  inputs  and  therefore  it  is  in  principle  non- 


detector  (Photoconductor  or  photodiode-amplifier  combination).  Polarizing  beam¬ 
splitters  may  be  applied  whenever  a  spatial  separation  is  required  between  the  A 
and  b  lines.  The  main  advantage  of  this  gate  is  its  relative  simplicity  while 
its  disadvantage  is  the  different  nature  of  the  C-line  that  also  changes  level 
during  transition  through  a  gate  (i.e.  there  is  a  lower  light  intensity  in  C' 
than  i  c.  This  effect  may.  however,  be  corrected  by  incorporating  an  amplifying 
medium  on  the  line). 

In  figure  4  we  show  a  schematic  diagram  of  the  acousto-optic  gate: 
The  two  input  lines  are  laser  beams  incident  on  an  acousto-optic  deflector 
(either  bulk  or  integrated  SAW)  at  the  bragg  angle.  If  there  is  no  acoustic 
signal  (C  =  0),  the  two  beams  continue  unaffected  (A'  and  B')  while  if  C  is  pre¬ 
sent  each  beam  is  deflected  into  the  other  channel.  This  is  also  a  imple  gate 
but,  here  too,  one  has  a  C  line  which  is  basically  different  in  nature  than  the 
other  two  lines.  Nevertheless  this  kind  of  gate  can  be  easily  cascaded  and 
intergrated.  For  example,  a  single  acoustic  pulse  may  activate  many  gates  as  it 
travels  along  the  system. 


The  photorefractive  gate ,  based  on  four-wave-mixing  is  an  all  opti¬ 
cal  gate  with  one  of  its  tentative  implementations  illustrated  in  Fig.  5.  In 
this  case  the  C-line  constitutes  the  two  pump  beams.  The  inputs  A  and  b  are 
transmitted  if  C  is  absent  and  phase-conjugated  when  the  pump  is  present  result¬ 
ing  in  a  switching  between  the  outputs. 


,v 


■ 


In  optical  communication  and  integrated  optical  systems  a  modulated 
waveguied  or  fiber  coupler  may  serve  as  a  Fredkin  gate.  Two  general  classes  of 
this  kind  of  gates  may  be  implemented.  The  out-ot-plane  control,  snown  schemat¬ 
ically  in  Fig.  6a,  and  in  the  inplane  control  with  one  possibility  depicted  n 
Fig.  6b.  A  number  of  workers  have  already  implemented  the  electronically 
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addressed  coupler4,5  that  may  serve  as  a  Fredkin  Gate  with  an  electronic  C- 
input.  To  symetrize  the  system  one  may  use  photodetection  combined  with  the 
electro-optic  coupler  to  facilitate  optical  control.  A  more  advanced  technology 
would  be  the  use  of  photorefractive  material  for  direct  optical  control  of  the 
coupling  constant.  The  example  in  (b)  is  a  wave  guide  coupler  incorporating 
highly  anisotropic  guides  containing  nonlinear  material.  The  two  coupling  waves 
(A  and  B)  are  introduced  with  the  same  polarization  so  that  they  can  couple 
while  the  control  signal.  C.  is  orthogonally  polarized  in  such  a  way  that  its 
power  is  used  to  activate  the  coupling  between  the  A  and  b  channels  but  it  does 
not  couple  itself  into  the  other  guide6. 

Proposed  Devices  Incorporating  Fredkin  Gates:  We  demonstrate  the 
applicability  of  these  new  gates  by  proposing,  in  addition  to  the  conventional 
logic  gates,  two  very  useful  devices  that  incorporate  arrays  of  the  waveguide 
gates  shown  in  Fig.  6. 

The  Optical  Crossbar.  The  gate  array  of  Fig.  7  may  be  constructed 
of  gates  of  the  type  depicted  in  Fig.  6a  or  the  type  of  Fig.  6b.  In  the  first 
case  each  gate  may  be  accessed  randomly  from  above  by  an  electric  field  or  by 
light,  depending  on  the  specific  device  used.  As  we  are  dealing  with  optical 
computing  we  might  prefer  activation  by  light  such  as  a  holographic  coupler8  or 
fiber  coupler.  With  proper  addressing  each  input  line  can  be  coupled  to  each 
output  line.  This  system  may  prove  to  be  an  extremely  fast  and  efficient  cross¬ 
bar  or  optical  switchboard.  The  in-plane  addressing  of  Fig.  6b  is  applicable  if 
one  desires  to  activate  a  whole  column  together.  At  first  sight  it  appears  that 
this  kind  of  addressing  is  not  suitable  for  random  access;  however  with  very 
fast  pulses  this  also  becomes  feasible. 

The  Tapped  Delay  Line:  The  basic  configuration  of  Fig.  8a  is  a 
tapped  delay  line.  A  fiber  ring  may  be  utilized  for  long  delays  while  for  very 
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short  delays  one  may  use  waveguide  rings  the  feasibility  of  which  has  also  been 
demonstrated9,10.  Here  too,  the  addressing  may  be  of  the  first  type  (Fig.  6a) 
or  of  the  second  type  (Fig.  6b).  Such  a  setup  may  be  used  to  delay  all  the 
energy  in  a  pulse  or  just  part  of  it  to  produce  a  pulse  train  from  a  single 
intitial  pulse.  A  slight  modification  of  the  system  as  illustrated  in  Fig.  8b 
may  be  used  to  reverse  the  direction  of  signal  flow  resulting  in  a  true  rever¬ 
sible  Fredkin  gate.  In  the  future,  an  optical  memory  block  may  resemble  the 
array  depicted  in  Fig.  8c.  This  seems  to  be  a  short  term  memory,  but  with  the 
integration  of  amplifying  medium  it  may  serve  also  as  a  long-term  memory. 

Discussion:  Conventional  approaches  to  optical  computing  followed  the 

lines  put  forward  by  workers  with  electronic  systems.  Traditional  logic  gates 
are  well  suited  for  electronic  computing  but  may  not  be  the  best  choice  for 
for  optical  processors.  In  this  work  we  indicated  that  one  should  also  consider 
different  implementations  for  optical  computing  systems  with  one  very  promising 
possibility  being  the  Fredkin  gate.  These  gates  have  many  simple  optical  imple¬ 
mentations  and  may  prove  to  be  very  fast  and  energy  efficient .  The  various 
implementations  and  applications  given  here  are  just  samples  to  indicate  the 
diverse  possibilities  available. 
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(b)  Work  in  Progress 

Work  has  been  directed  toward  an  optical  neighborhood  processor 
designed  to  operate  on  two  dimensional  array  of  l's  and  0’s.  The  primary 
application  will  be  to  two  dimensional  image  processing,  we  have  achieved  a 
design  for  an  optically  implemented  table  look-up  device,  which  will  be  the 
heart  of  this  processor.  The  architecture  for  this  table  look-up  device  is 
illustrated  in  Figure  1. 

Input  to  this  device  consists  of  a  9-bit  string  of  l's  and  0’s.  I 
image  processing  applications,  this  string  will  originate  from  a  3X3  neigh¬ 
borhood.  Output  from  the  device  is  a  1  or  anO,  depending  on  whether  the  string 
is  in  the  table  or  not.  As  shown  in  Fig.  1,  the  input  is  fed  into  two  banks  of 
9  LED's  In  one  bank,  and  LED  is  lit  in  each  position  corresponding  to  a  1  in 
the  input  string.  In  the  other  bank,  the  positions  corresponding  to  0’s  only 
are  lit.  There  are  two  tables  of  masks.  In  the  first  mask  table,  positions 
corresponding  to  Is  are  clear,  allowing  light  to  pass  through,  while  in  the 
second,  positions  corresponding  to  0's  are  clear. 

Light  from  the  LED's  passes  through  the  mask  tables.  A  maximum 
amount  of  light  passes  through  when  the  lit  LED's  concide  with  the  clear  mask 
positions.  Thus,  all  table  entries  which  have  l's  in  the  same  positions  as  the 
input  string  will  allow  a  maximum  amount  of  light  to  pass  through  in  the  l’s 
channel"  of  the  device.  Similarly,  table  entries  which  have  0's  in  the  same 
positions  as  the  input  string  will  allow  a  maximum  amount  of  light  through  in 
the  "0's  channel".  However,  only  the  one  table  entry  which  has  both  l's  and  Os 
in  the  same  positions  as  the  input  string  (i.e.,  which  exactly  matches  the  input 
string)  will  allow  the  maximum  amount  of  light  (the  intensity  of  9  LED's) 
through  both  channels  at  the  same  time. 

Light  which  pases  through  the  mask  tables  is  collected  onto  detec¬ 
tors,  summed  together,  and  fed  into  a  threshold  device.  The  threshold  can  be 
set,  for  example,  at  8.5,  so  that  only  the  light  from  9  LED's  will  be  greater 
than  the  threshold.  Since  this  amount  of  light  wili  pas  through  only  when  the 
input  string  exactly  matches  a  table  entry,  the  threshold  device  will  determine 
when  a  match  has  occured. 

By  transforming  3X3  array  neighborhoods  into  9-bit  strings,  this 
table  look-up  device  can  be  used  as  cellular  array  processor  for  applications 
such  as  image  processing.  One  possible  transformation  can  be  described  as 
follows.  Consider  an  arrray  with  coordinates  labeled  as  shown: 


MM 


(1.1) 

(1.2) 

(1.3) 

(1.4) 

(2.1) 

(2.2) 

(2.3) 

(2.4) 

(3,1) 

(3.2) 

(3.3) 

(3.4) 

The  3  3  neighborhood  in  the  upper  left  corner  can  be  transformed  to  the  string 

.  (1.3) 

(2.3) 

(3.3) 

(1.2) 

(2.2) 

(3.2) 

(1.1) 

(2.1) 

(3.1) 

The  next  neighborhood  to  be  transformed  is  obtained  by  moving  one  position  to 
the  right.  This  neighborhood  is  transformed  to 

(1.4) 

(2.4) 

(3.4) 

(1.3) 

(2.3) 

(3.3) 

(1.2) 

(2.2) 

(3,2) . 

New  information  is  added  at  the  'top'  of  the  string,  while  old  data  drops  out 
the  'bottom1.  The  advantage  of  this  transformation  scheme  is  that  it  allows  for 
a  smooth  flow  of  data  from  the  two  dimensional  array  to  the  input  bit  string. 
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3.  CONCLUSIONS 

From  these  preliminary  studies,  some  equally  preliminary  conclu¬ 
sions  can  be  drawn.  First,  high  accuracy  results  do  not  always  require  high 
accuracy  processors.  By  clever  use  of  appropriate  electronics,  we  can  combine 
some  of  the  virtues  of  high  speed  analog  optical  processing  with  the  accuracy 
normally  associated  with  high  speed  digital  processing.  We  conclude,  as  well 
that  ultimately  high  speed  optical  processing  may  very  well  utilize  optical 
Fredkin  Gates.  In  any  case,  further  study  of  this  technique  seems  amply 
justified. 
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Example:  Search  Che  Cable  for  che  binary  scring  '011000001'.  In  Che 
"one's  channel",  che  LED’s  are  lie  wich  Che  paccern  '  ULLUUUUl'L'  (U  =  b'nlic, 
L  =  Lie),  while  in  che  "zero's  channel"  che  paccern  is  '  LUliLLLLLU '  . 
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ABSTRACT 


An  optical-hybrid  computer  is  compared  in  its  speed  to  the 
digital  computers.  Optical-hybrid  computers  are  shown  to  be  far 
more  superior  in  their  speed  in  solving  systems  of  linear 
equations.  This  advantage  in  speed  increases  with  the  increase  of 
the  size  of  the  matrix.  The  problem  of  the  convergence  of  the 
solution  using  the  optical-hybrid  computer  is  discussed  and  it  is 
found  that  using  optical  systems  with  an  error  of  about  5%  assures 
convergence  for  matrices  with  condition  number  as  high  as  150. 
Seme  means  of  improving  the  condition  number  of  a  matrix  are 
also  introduced. 


1-  INTRODUCTION 

-  t*.  > 

Analog  optics  is  very  attractive  for  signal  processing  and  c. 

V 

computing  because  of  its  ability  to  process  two-dimensional  data  in 
parallel  very  rapidly.  Unfortunately,  this  high  speed  parallel  A 

processing  acheives  only  low  accuracy  because  of  the  nature  of  the 
analog  processing  especially  in  the  optical  systems.  These  A 

accuracy  problems  rise  from  errors  in  representing  and  reading  the 
signal  using  the  electrooptic  I/O  devices.  The  method  introduced  by 
Caulfield  (which  is  described  in  the  first  part  of  this  report)  " 
combines  the  high  speed  and  parallelism  of  the  optical  computer 
and  the  high  accuracy  of  the  digital  computer  , using  Lord  Kelvin’s 
iterative  method  .  In  section  II  of  this  paper  we  present  a 
comparison  between  the  time  required  to  solve  a  system  of  linear 
equations  using  the  optical-hybrid  computer  versus  that  required  by  ^ 
the  digital  computer.  In  section  III  we  present  a  numerical 
analysis  of  the  convergence  of  the  solutions  for  a  linear  algebraic 
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equations  as  a  function  of  the  condition  number  of  the  matrix  by 
using  computer  simulation  of  the  optical-hybrid  computer.  In 
section  IV  a  conclusion  and  final  remarks  are  presented. 

II.  COMPUTAION  SPEED  ANALYSIS 

The  optical-hybrid  computer  works  in  the  following  manner  for 

Hr 

a  system  of  linear  equations 

A  x  =  b  .  (1) 

a)  Using  an  optical  analog  processor  we  can  calculate  an 

approximate  solution  x  of  the  linear  system,  the  superscipt  °’s 

indicate  inaccuracies  in  the  optics  and  electronics 

oo  o 

A  x  =  b  .  (2) 

b)  Remember  the  solution  to  a  high  accuracy.  Use  a  dedicated 
digital  processor  to  calculate  the  residue 

r  =  b  -  A  x  =  A  (  x-x  )  =  A  Ax  (3) 

c)  Use  the  optical  analog  processor  to  solve  the  linear  equation 

o  o 

A  y  =  sr  ,  where  y  =  sAx  ,  (4) 

for  Ax,  where  s  is  a  "radix",  or  scale  factor  chosen  to  make  a  good 
use  of  the  dynamic  range. 

d)  Use  the  digital  processor  to  refine  the  solution  for  x 

i  o 

x  =  x  +Ax  .  (5) 


*  It  is  also  applicable  to  other  problems  -  both  linear  and  nonlinear. 
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If  the  refined  solution  x  is  accurate  enough  terminate  the 

iterations.  Otherwise  go  back  again  to  (b),  (c)  and  (d)  for  a  more 

refined  solution  following  the  above  outlined  procedure. 

To  get  some  quantitative  values  for  the  speed  of  this  process 

compared  to  that  carried  by  digital  computers,  we  will  calculate 

the  number  of  operations  required  by  each  method  then  multiply  it 

by  the  time  required  by  each  operation. 

let  us  consider  an  nxn  matrix  A  ,  the  time  required  for  one 

(3) 

iteration  outlined  above,  Qq^  is  given  by 

Qoi=  2QAi  +  (n  +3n)QD1,  (6) 

where  Q^,=  the  Hme  squired  to  Ax=b  by  analog  optics, 
the  time  required  to  make  one  digital  operation. 


Therefore  the  time  required  to  make  I  iterations  is  given  by 

Q0=  JQoi  =  I  {2QA1  + (n +3n)  QD1  ].  (7) 

While  the  time  required  by  the  digital  computer  to  solve  the  linear 
equation  using  the  Gaussian  elemination  method  takes  n  /3 
operations,  and  by  using  the  Cholesky’s  method  the  number  of 
operations  can  be  reduced  to  n  /6.  Hence  the  time  required  to  solve 
the  linear  equation,  Q^,  is  given  by 


QD‘ 


n  Q 


Di 


18) 


Comparing  Eqs.  (7)  and  (8)  it  is  clear  that 

Q0  <<  Qd  ,  (9) 

Therefore,  for  a  clear  time  advantage  for  the  hybrid  scheme,  we 


« 

V. 
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want 


I  {  2QA1+  (n  +  3n)  <<  (n  /6)  QD1 

(10) 

or 

3 

2 

21  QA1  «  [n/6 

-  I(n  +3n)]  QD1 

(11) 

or 

n  /6  -  I  (n  +3n) 

21 

0Di 

-ir->>  i 

yA1 

(12) 

The  advantage  of  using  the  optical-hybrid  over  the  digital  computer 
in  speed  is  very  obvious  from  Eq.  (12),  and  it  will  increase  by  the 
increase  of  the  size  of  the  matrix  n.  Eq.  (12)  can  be  rewritten  in 
the  following  format 


Ap  Aj  »1 

where  A  -  (  n  /6  -I(n  -3n)}/21 

(13) 

(14) 

Af  QD1/QA1 

(15) 

Here  Aj  is  an  "inherent  advantage".  A  single  analog  operation  is 
much  faster  than  the  digital  one.  The  whole  Ax=b  solution  will  be 
slower  than  a  single  digital  operation,  but  the  analog  optical  Ax=b 
solver  works  at  speeds  independent  of  n.  On  the  otherhand,  QD1  is 
eigenvalue  dependent.  Ap  is  a  problem  related  advantage  with  the 
increase  in  n,  Ap  increases  very  rapidly.  Clearly  ,also  we  want  to 
keep  the  number  of  iterations  low. 

From  the  above  discussion  we  see  that  the  optical-hybrid 
computer  can  acheive  results  in  a  much  shorter  time  especially  for 
large  matrix  sizes.  But  does  this  process  always  work  or  converge? 


Ill  CONVERGENCE  OF  THE  SOLUTION 

The  block  diagram  of  the  optical-hybrid  computer  is  shown  in 

Fig.l.  The  solution  of  the  linear  algebraic  equation  will  be  done 

(4) 

optically  using  the  method  introduced  by  Cheng  and  Caulfield  .  The 
question  of  the  convergence  is  discussed  in  the  previous  paper  and 
its  is  found  that  if  the  matrix  has  positive  eigenvalues  then  the 
solution  will  converge  regardless  of  the  size  of  the  matrix.  This  of 
course,  applies  simply  to  step  (c).  We  turn  next  to  the  total 
process. 

In  this  section  of  the  paper  we  present  a  numerical  analysis  of 
the  convergence  of  the  solution  and  its  dependance  on  the  condition 
number  of  the  matrix.  The  condition  number  of  the  matrix  A  is 
defined  as 

k(A)  =  NAN  IIA'1!!  (16) 

where  1 1  1 1  is  the  norm  of  the  matrix.  The  condition  number  is  a 
measure  of  the  accuracy  of  the  Ax=b  solutions.  The  larger  the 
condition  number  the  less  accurate  the  result  acheived  with  any 
fixed  accuracy  computer.  In  this  paper  we  report  a  simulation  of 
the  system  shown  in  Fig.l  by  a  computer  algorithm  to  study  the 
convergence  of  the  solution  of  the  linear  equation.  The  computer 
algorithm  simulates  the  analog  optical  processor  and  the  electro¬ 
optic  I/O  devices  in  such  a  way  that  allows  us  to  control  the  errors 
occuring  in  representing  the  matrix  by  an  optical  mask,  and  also 
the  error  in  reading  the  photodiode  voltage  and  in  converting  the 
voltage  input  to  the  system  to  light  in  the  LEDs.  To  simulate  the 
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experimental  environment  we  have  used  a  Gaussian  random  number 
generator  to  generate  the  error  signals. 

The  curve  shown  in  Fig. 2  is  the  result  of  a  simulation 
experiment  for  optical-hybrid  computer  with  the  following 
charactaristics:  The  matrix  A  can  be  represented  by  an  optical 
mask  (a  photographic  film,  or  a  spatial  light  modulator)  with  an 
error  equal  to  1%  of  the  maximum  coefficient  of  the  matrix.  The 
vector  x  can  be  read  with  an  error  of-  standard  deviation^  %  of  the 
maximum  element  of  the  vector  x  in  the  electronics  also  the 
standard  deviation  in  representing  the  vector  b  by  the  photodiode  is 
1%.  From  Fig.  2  we  see  that  solutions  converge  with  an  error  of 
less  than  10  (or  any  other  accuracy)  even  for  condition  number 
500.  For  condition  numbers  less  than  250  the  number  of  iterations 
required  are  less  that  20.  In  order  to  guarantee  convergence  with 
1%  accuracies,  we  must  restrict  matrices  to  condition  numbers 
less  than  50. 

To  study  the  effect  of  the  error  in  representing  the  matrix  by  an 
optical  mask  on  the  number  of  iterations  to  get  a  solution  within 
10  error,  we  have  changed  the  standard  deviation  of  the  error  in 
representing  the  mask  over  the  range  from  1%  to  30%  for  a 
condition  number  150  and  we  calculated  the  number  of  the  iterations 
required  for  each  case.  Fig. 3  shows  the  relation  between  the 
number  of  iterations  as  function  of  standard  deviation  of  the  error  in 
representing  the  matrix.  As  the  error  increases  the  number  of 
iterations  increase  in  an  almost  linear  way.  Even  for  an  error  of 
30%  in  representing  the  matrix,  the  solution  still  converge.  This 
interesting  result  proves  that  even  by  using  inaccurate  optics,  the 
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optical-hybrid  computer  can  still  solve  the  linear  equation  very 
accurately.  This  result  is  experimental,  and  therefore,  guaranteed. 
It  appears,  however,  that  this  approach  works  often  even  in  these 
unguaranteed  cases. 

The  condition  number  is  one  the  determining  factors  of  the  speed 
of  convergence  of  the  solutions  as  can  be  seen  from  Fig. 3. 
Smaller  condition  numbers  yeild  faster  convergence  of  the  solution. 
In  searching  for  a  way  to  improve  the  condition  number  of  a  given 
matrix,  we  found  one  way  of  doing  that  is  by  normalizing  the 
matrix  in  the  following  manner 

2  2  2  i 

ail=  ai  i/[  ai  1  +ai2  + . +  ain  i  ;i=l,2,..,n  (17) 

where  a  is  the  coefficient  of  the  matrix  A  .  This  normalization 
decreases  the  value  of  the  condition  number  of  the  matrix  which  in 
turn  increases  the  speed  of  the  convergence  process.  Fig. 4  shows  a 
plot  of  the  condition  number  before  and  after  the  normalization  of 
the  matrix,  from  which  we  can  see  an  improvement  in  the  condition 
number  after  the  normalization. 

IV  CONCLUSIONS 

The  optical-hybrid  computer  discussed  in  this  paper  have  shown 
very  promising  results,  it  is  clearly  faster  than  the  digital 
computer  in  solving  this  class  of  problems.  The  advantage  of  the 
speed  of  this  optical-hybrid  increases  with  the  increase  of  the  size 
of  the  matrix.  The  analysis  carried  above  is  not  limited  to  the 


solution  of  a  system  of  linear  equations  but  is  applicable  as  well  to 
other  linear  and  nonlinear  problems.  The  same  calculations  we 
carried  for  the  comparison  of  the  speed  is  quite  similar  to  that  of 
the  power  consumption.  Another  interesting  result  presented  here 
is  that  the  optics  which  is  used  in  the  system  can  a  tolerence  of  5  to 
10%  without  sacrificing  the  accuracy  of  the  solution,  although  it  is 
shown  that  the  less  error  in  both  optics  and  electronics  the  faster 
the  solution  will  converge. 


REFERENCES 

1.  H.John  Caulfield,  J.H.  Gruninger,J.E.  Ludman,K.Steiglitz,H. 
Rabitz  and  J.  Gelfand,Bimcdal  Optical  Computers,  to  be  published. 

2.  W.  Thompson  (Lord  Kelvin), Proc.  Roy.  Soc.  (London)  28,111 
(1878). 

3.  G.W.  Stewart, Introduction  to  Matrix  Computations,  Academic 
Press  1973. 

4.  Wai  K.  Cheng  and  H.  John  Caulfield, Fully-Parallel  Relaxation 
Algebraic  Operations  for  Optical  Computers,  Opt.  Comm. 
43, No. 4, 25 1  (82). 


iptical-hybrid  computer. 


.01*  (a,. 


function  of  the  condition  number  of  the  matrix. 


PROBE  SYSTEMS  DIVISION 
ULTRASYSTEMS  DEFENSE  AND  SPACE  SYSTEMS,  INC 
655  North  Pastoria  Avenue 
Sunnyvale,  CA  94086 


A-0  COMPUTING  STUDY 
FINAL  REPORT 

PS-ER-5652-01 


31  December  1985 


PS-ER-5c 


TA8LE  OF  CONTENTS 

Section  Title  Pa 

1.  OPTICAL  J-K  FLIP-FLOP  ANALYSIS  INTRODUCTION  .  1 

1.1  Optical  J-K  Flip-Flop  Description  .  1 

1.2  J-K  Flip-Flop  Modes  of  Operation .  4 

2.  SYNCHRONIZATION  EFFECTS  IN  THE  OPTICAL  J-K  FLIP-FLOP .  10 

2.1  The  Effects  of  Optical  Path  Length  Errors .  10 

2.2  The  Effects  of  Asynchroni zation  of  Input  Signals .  10 

2.3  The  Effects  of  Pulse  Dispersion  in  the  Thresholding  Gain 

Mechanism .  10 

2.4  Effects  of  Mismatch  Between  Input  Period  and  Round  Trip  Delay  20 

3.  THRESHOLD  AND  WEIGHTING  ANALYSIS .  25 

3.1  Noncoherent  Optical  J-K  Flip-Flop  Analysis .  25 

3.2  Coherent  Optical  J-K  Flip-Flop  Analysis  .  33 


PS-ER-565c- „ . 


Figure 


LIST  OF  ILLUSTRATIONS 


Ti  tie 


Proposed  Optically  Implemented  J-K  Flip-Flop 


Relationship  Between  Inputs  and  Outputs  of  Ideal 
Thresholding  Gain  Device  . 


Ideal  Operation  of  Optical  J-K  Flip-Flop 


Multiplexed  Operation  of  Optical  J-K  Flip-Flop 


Optical  J-K  Flip-Flop  Operating  at  100%  Duty  Cycle 


The  Effects  of  a  +0.1%  Error  in  the  Length  of  The  X  Feedback 
Path . 

The  Effects  of  a  -0.1%  Error  in  the  Length  of  The  X  Feedback 
Path . 

The  Effects  of  a  +0.1%  Error  in  the  Length  of  The  Y  Feedback 
Path . 

The  Effects  of  a  -0.1%  Error  in  the  Length  of  The  Y  Feedback 
Path . 

The  Effects  of  a  +0.1%  Error  in  the  Length  of  The  Z  Feedback 
Path . 

The  Effects  of  a  -0.1%  Error  in  the  Length  of  The  Z  Feedback 
Path . 

The  Long-Term  Effects  of  a  +0.1%  Error  in  the  Length  of  The 
Feedback  Path . 

The  Effects  of  the  J  Input  Leading  The  K  Input  By  1%  ...  . 

The  Effects  of  the  J  Input  Laging  The  K  Input  By  1% . 

The  Effects  of  Pulse  Width  Dispersion  of  0.2% . 


The  Effects  of  Match  Filtering  on  Spurious  Signals 
Introduced  by  a  1%  Error  in  The  X  Path  Length.  .  . 


The  Effects  of  the  Input  Clock  Rate  5%  Higher  Than  The 
Internal  Clock  Rate . 


Noncoherent  Implementation  of  Optical  J-K  Flip-Flop. 


Equivalent  Model  of  Noncoherent  Optical  Interconnect  and 
Threshold . 


Page 


0632C 


5-5 


v  Vv  «■ 


LIST  OF  ILLUSTRATIONS 


Title 


Probability  Density  Functions  at  Input  to  Threshold  and 
Probability  of  Error  Calculation  For  "OR"  Operation  . 

Probability  Density  Functions  at  Input  to  Threshold  and 
Probability  of  Error  Calculation  For  "AND"  Operation . 

Equivalent  Model  of  Coherent  Optical  Interconnect  and 
Threshold  . 

Phasor  Representation  of  Complex  Weighting  of  Input  Signal.  . 

Phasor  Representation  of  the  Sum  of  Two  Arbitrary  Signals  With 
Noise  Components . 

Chi-Squared  Density  Function . 

Optimum  Weights  and  Threshold  For  X  Logic  Operation  . 

Optimum  Weights  and  Threshold  For  Z  Logic  Operation  . 

Optimum  Weights  and  Threshold  For  "OR"  Logic  Operation.  .  .  . 

Optimum  Weights  and  Threshold  For  "Exclusive  OR"  Operation.  . 

Optimum  Weights  and  Threshold  For  "AND"  Logic  Operation  .  .  . 


PS-EP.  -  d652-01 


s 


u- 

£ 
t-  • 


KV 

.y 


i 


«s 


E- 

r; 


The  proposed  implementation  of  an  optical  J-K  flip-flop  is  shown 
in  Figure  1-1.  In  this  implementation,  the  J  and  K  inputs  are  externally 
supplied  binary  laser  signals.  The  binary  signal  is  represented  by  a  on-off 
modulation  of  a  laser  carrier.  These  signals  enter  the  flip-flop  through  a  beam 
splitter  and  pass  through  a  holographic  interconnection  array.  The 
interconnection  array  provides  a  weighted  interconnect  between  inputs  and 
outputs  as  shown. 

The  output  signals  then  pass  through  a  thresholding  gain  mechanism. 
This  mechanism  amplifies  signals  of  amplitudes  greater  than  a  built-in 
threshold.  And  signals  below  the  threshold  are  not  passed.  Ideally  the  outputs 
of  the  thresholding  gain  mechanism  are  related  to  the  inputs  by  the  relations 
given  in  Equations  (1-1)  and  (1-2)  and  shown  in  Figure  1-2. 


Input  >  threshold 

output  =  1 

(1-1) 

Input  <  threshold 

output  *  0 

(1-2) 

In  the  case  shown  in  Figure  1-1  a  threshold  of  1/2  is  used.  Some  of 
the  signals  are  fed  back  from  the  thresholding  gain  array  to  the  holographic 
interconnect  by  a  series  of  mirrors.  This  path  not  only  provides  feedback  but 
also  provides  an  inherent  time  delay  that  serves  as  an  internal  clock  of  the 
flip-flop.  Finally,  at  the  output  of  the  thresholding  gain  mechanism,  part  of 
the  output  of  the  flip-flop  exits  through  a  beam  splitter. 
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Continued. 


The  key  difference  between  the  operation  of  this  optical  flip-flop  and 
an  electronic  flip-flop  is  that  the  optical  version  lacks  an  externally  supplied 
clock  signal.  The  clock  in  the  electronic  flip-flop  provides  control  over  the 
sequencing  of  the  state  changes  of  the  output  of  the  flop-flop.  Usually, 
flip-flops  are  designed  such  that  the  current  state  of  the  output  is  dependent 
on  the  state  of  the  output  at  the  last  leading  (or  trailing)  edge  of  the  clock 
pulse.  The  truth  table  of  an  electronic  J-K  flip-flop  is  shown  in  Table  1-1. 

Since  the  optical  flip-flop  lacks  an  external  clock,  the  input  signal 
period  must  be  matched  quite  accurately  to  the  internal  feedback  delay  time. 
Since  the  feedback  signals  travel  at  the  speed  of  light  over  relatively  short 
paths  (>lm),  the  I/O  rate  of  the  optical  flip-flop  can  be  made  many  times  faster 
than  an  electronic  flip-flop. 

1.2 _ J-K  Flip-Flop  Modes  of  Operation. 

To  begin  the  analysis,  the  J-K  flip-flop  was  modeled  under  ideal 
conditions  (no  losses,  no  dispersion,  use  of  the  gain  curve  of  Figure  1-2,  equal 
optical  feedback  paths,  and  perfect  synchronization  of  inputs  with  internal 
clocking)  on  a  VAX  11/750,  to  verify  the  operation  of  the  flip-flop.  In 
addition,  three  modes  of  operation  were  simulated. 

In  the  "basic"  mode  of  operation,  the  J  and  K  inputs  are  pulsed  (e.g. 
25%  duty  cycle)  synchronously  with  the  built-in  delay.  Figure  1-3  shows  a 
timing  diagram  of  the  simulated  inputs  and  outputs  of  the  optical  J-K  flip-flop. 
First,  the  figure  shows  the  case  where  both  the  J  and  K  inputs  are  clocking  a 
steady  stream  of  "ones",  and  the  Y  output  cycles  alternately  between  "one"  and 
"zero"  (note:  the  outputs  are  delayed  by  1/2  cycle).  The  remaining  cases  shown 
in  Figure  1-3  are:  (1)  J-high  and  K-low  (Y  high),  (2)  J-low  and  K-high  (Y  low), 
and  (3)  J  and  K  low  (latches  last  value  of  Y  into  flip-flop).  In  comparison 
with  the  truth  Table  1-1,  the  optical  J-K  flip-flop  performs  as  predicted. 
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Since  the  state  of  the  output  is  only  dependent  on  the  state  of  the 
inputs  one-half  clock  cycle  back  and  is  entirely  independent  of  the  state  of  the 
inputs  anywhere  else,  the  flip-flop  can  be  operated  in  a  time  multiplexed  mode, 
where  several  bits  are  input  within  a  single  feedback  delay.  Figure  1-4  shows 
an  example  of  this  mode  of  operation.  In  this  figure  the  four  inputs  are  time 
multiplexed  into  the  flip-flop.  The  pulse  width  of  any  one  of  the  inputs  is 
l/8th  the  internal  delay  period.  In  the  figure,  the  first  and  third  inputs  are 
held  high  and  thus  the  first  and  third  outputs  cycle  between  high  and  low.  The 
second  inputs,  show  the  J  input  high  and  the  K  input  low,  thus  the  output  high. 
The  fourth  input,  shows  the  J  low  and  the  K  high,  thus  the  output  low.  Again 
the  multiplexed  operation  can  be  verified  by  the  J-K  flip-flop  truth  table 
(Table  1-1). 

Finally,  since  the  flip-flop  is  not  constrained  to  operate  on  the 
leading  or  trailing  edge  of  a  clock  pulse,  a  mode  of  operation  can  be 
hypothesized  where  the  inputs  are  pulsed  at  a  100%  duty  cycle.  Figure  1-5 
demonstrates  this  mode  of  operation  with  inputs  analogous  to  those  used  in 
Figure  1-3.  Again,  the  operation  of  the  J-K  flip-flop  can  be  verified  using  the 
truth  table  (Table  1-1). 

In  each  of  the  cases  discussed  above,  the  operation  speed  is  set  by 
the  round  trip  delay  time  (trt)  of  the  feedback  paths.  The  time  between 
subsequent  data  entries  must  be  equal  to  2  x  trt/N  for  N  level  multiplexing  (N=l 
for  nonmultiplexed  operation). 
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2.  SYNCHRONIZATION  EFFECTS  IN  THE  OPTICAL  J-K  FLIP-FLOP. 


This  section  discusses  the  effects  of  the  asynchronous  arrival  of 
signals  at  the  threshold  gain  mechanism.  This  section  does  not  address  the 
effects  of  phase  errors  on  the  destructive  coherent  interference  used  to  obtain 
negative  weights  in  the  coherent  system.  The  discussion  of  phase  errors  is 
differed  to  Section  3.2.  This  section  only  addresses  the  more  general  and 
"idealized"  effects  of  synchronization,  and  probably  pertains  more  to  the 
noncoherent  case  discussed  in  Section  3.1. 

2.1 _ The  Effects  of  Optical  Path  Length  Errors. 

The  proposed  optical  J-K  flip-flop  contains  three  optical  feedback 
paths  (labeled  X,  Y,  and  Z  in  Figure  1-1).  Errors  in  the  lengths  of  any  of 
these  paths  would  introduce  asynchronization  of  the  arrival  of  pulses  at  the 
thresholding  gain  mechanism.  The  effect  of  the  asynchronous  arrival  of  pulses 
at  the  gain  mechanism,  is  the  introduction  of  spurious  "gl itchy"  signals  at  the 
output  of  the  flip-flop.  Examples  of  the  spurious  signals  introduced  by  errors 
of  ±0.1%  in  each  of  the  optical  feedback  paths,  (X,  Y,  and  Z)  are  shown  in 
Figures  2-1  through  2-6.  In  general,  the  narrow  "glitches"  can  be  minimized  by 
internal  or  external  filtering.  However,  as  shown  in  Figure  2-7,  in  cases  where 
the  inputs  remain  fixed  over  many  cycles,  the  glitches  spread  in  time  and  become 
more  apt  to  produce  errors  at  the  output. 
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2.2  _ The  Effects  of  Asynchronization  of  Input  Signals. 

»  j. 

Another  possible  source  of  asynchronization  of  pulses  at  the  gain  pp 

mechanism  is  any  skew  of  the  input  signals.  These  effects  are  shown  in  v’ 

Figures  2-8  (J  leads  K  by  1%)  and  2-9  (J  lags  K  by  1%).  In  both  these  cases  we 
again  see  the  introduction  of  spurious  outputs.  £* 

2.3  _ The  Effects  of  Pulse  Dispersion  in  the  Thresholding  Gain  Mechanism.  ^ 


Another  potential  source  of  undesirable  spurious  signals  is  dispersion 
due  to  the  finite  frequency  response  of  the  thresholding  gain  mechanism.  The 
finite  frequency  response  of  the  thresholding  gain  mechanism  will,  in  general. 
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Effects  of  a  +0.1%  Error  in  The  Length  of  The  X  Feedback  Path 
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Figure  2-9.  The  Effects  of  the  J  Input  Laging  The  K  Input  By  1% 


2.3 


Continued. 


cause  a  spreading  of  the  pulse  in  time.  This  spreading  could  potentially 
introduce  spurious  signals  at  the  leading  and  trailing  edges  of  internally 
replicated  pulses  arriving  synchronously  with  input  pulses  at  the  gain 
mechanism.  This  effect  is  portrayed  in  Figure  2-10,  where  the  finite  frequency 
response  was  simulated  by  the  time  correlation  of  the  signals  arriving  at  the 
gain  mechanism  with  a  Gaussian  dispersion  pulse.  In  this  case,  a  filter  1/e 
width  of  0.2%  of  the  pulse  width  was  used. 

Careful  design  with  respect  to  the  frequency  response  of  the  gain 
mechanism  may,  however,  help  to  minimize  spurious  signals.  In  observing 
Figures  2-1  through  2-10,  it  is  evident  that  the  spurious  signals  appear  as 
sharp  spikes  in  the  output.  These  spikes  are  of  a  higher  frequency  content  and 
are  more  susceptible  to  suppression  by  the  finite  frequency  response  of  the  gain 
mechanism  than  the  signal  pulses.  By  choosing  a  pulse  width,  more  closely 
matching  the  response  time  of  the  gain  mechanism,  better  performance  can  be 
achieved,  as  measured  by  the  ratio  of  the  "true"  signal  response  to  the  spurious 
signal  response. 

An  example  of  this  kind  of  improved  performance  is  shown  in 
Figure  2-11.  This  figure  shows  the  effects  of  a  Gaussian  filter,  of  width  equal 

to  the  input  pulsewidth,  on  the  spurious  signals  produced  by  a  1%  error  in  the 

length  of  the  X  feedback  path.  In  comparison  with  Figure  2-1,  we  see  that  the 

spurious  signals  have  been  effectively  eliminated. 

2.4 _ Effects  of  Mismatch  Between  Input  Period  and  Round  Trip  Delay. 

Another  source  of  erroneous  signals  is  that  of  a  mismatch  between  the 
period  of  the  input  signals  and  twice  the  round  trip  delay.  A  rather  extreme 
case  is  portrayed  in  Figure  2-12  where  the  input  period  is  5%  shorter  than  the 
internal  delay  time  with  no  filtering. 
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e  Effects  of  Matched  Filtering  on  Spurious  Signals  Introduced  by  a  1%  Error  in 
Path  Length 
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Effects  of  the  Input  Clock  Rate  5%  Higher  Than  the  Internal  Clock  Rate 
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2.4 _ --  Continued. 

In  general,  spurious  signals  introduced  by  this  pulse  rate  mismatch 
are  similar  to  those  introduced  by  path  length  errors  and  can  be  controlled  to 
some  degree  by  the  filtering  in  the  gain  mechanism.  However,  in  cases  where  the 
inputs  contain  long  strings  of  "zeros"  synchronization  with  the  externally 
clocked  signals  will  be  lost. 
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This  section  will  analyze  the  effects  of  random  noise  on  the  choice  of 
the  weights  and  thresholds  in  the  optical  J-K  flip-flop.  Two  cases  will  be 
addressed.  One  case  is  the  proposed  implementation  shown  in  Figure  1-1.  This 
implementation  utilizes  coherent  destructive  interference  to  obtain  negative 
weights.  In  this  case,  random  phase  noise  complicates  the  analysis.  Before 
proceeding  to  the  analysis  of  the  proposed  case,  we  will  first  analyze  another 
implementation  using  noncoherent  light  without  interference  (no  negative 
weights)  that  is  simpler  to  analyze  and  probably  more  readily  realizable  with 
current  technology. 

3.1 _ Noncoherent  Optical  J-K  Flip-Flop  Analysis. 

By  noncoherent,  it  is  meant  that  there  exists  no  fixed  phase 
relationship  between  the  inputs  of  the  flip-flop  and  the  feedback  paths.  For 
destructive  interference  (and  negative  weights)  a  fixed  temporal  phase 
relationship  between  the  input  pulses  of  light  and  those  being  fed  back  is  a 
necessary  condition.  In  the  noncoherent  system,  this  condition  is  not  satisfied 
and  all  weights  must  be  positive.  One  method  of  enabling  the  proper  operation 
of  the  flip-flop  with  all  positive  weights  is  to  use  inverting  logic 
incorporated  into  the  thresholding  gain  array.  The  necessary  modifications  to 
operate  in  an  incoherent  mode  are  shown  in  Figure  3-1. 

The  problem  of  choosing  the  proper  weights  and  thresholds  in  any 
implementation  is  one  of  choosing  the  weights  and  thresholds  that  will  give  the 
least  chance  of  an  erroneous  output  in  the  presence  of  noise  at  the  inputs.  In 
order  to  simplify  the  analysis,  we  note  that  each  of  the  outputs  of  threshold 
gain  array  can  be  modeled  independently  using  the  simplified  model  shown  in 
Figure  3-2.  In  this  model,  two  inputs  are  weighted,  summed,  and  compared  with  a 
threshold.  The  relationship  between  the  output  and  the  inputs  is  analogous  to 
an  "AND"  or  an  "OR"  gate  depending  on  where  the  threshold  is  set. 
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Figure  3-1.  Noncoherent  Implementation  of  Optical  J-K  Flip-Flop 
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3.1 _ --  Continued. 

To  further  simplify  the  analysis,  we  will  assume  the  inputs  can  be 
modeled  as  one  of  two  equiprobable  input  levels  as  given  in  Equations  (3-1) 


through  (3-4). 

a  =  Aq  +  nA  (with  probability  P^g  =  1/2)  (3-1) 
a  =  Ai  +  nA  (with  Probability  P =  1/2)  (3-2) 
b  =  Bg  +  ng  (with  probability  Pgg  =  1/2)  (3-3) 
b  =  +  ng  (with  probability  Pg^  *  1/2)  (3-4) 


In  Equations  (3-1)  through  (3-4),  a  and  b  are  the  inputs  to  the 
"gate",  A  and  B  are  the  input  levels  corresponding  to  a  "mark"  input  or  "zero" 
input  depending  on  the  subscript,  and  the  n's  are  noise  terms.  In  this  analysis 
the  noise  terms  will  be  modeled  as  zero  mean  Gaussian  distributed  noise.  The 
Gaussian  probability  density  function  is  given  by  Equation  (3-5). 


-n2/2o2 

f(n)  =  e-  (3-5) 

\/  2iro2 


Where  the  a2  is  the  variance  of  the  noise.  It  should  be  noted  that  this  model 
for  the  noise  is  only  a  rough  approximation  since  it  implies  the  possibility  of 
negative  inputs,  however,  we  will  use  this  model  since  it  simplifies  the 
analysis. 


From  the  relations  in  Equations  (3-1)  through  (3-4)  the  intermediate 
results  shown  in  Equations  (3-6)  through  (3-9)  can  be  derived. 


oo  c  v  zn  ProDaDlll‘'y  koq 


00  =  A0  ‘  WA  +  B0  *  WB 

(3-6) 

01  =  A0  *  WA  +  B1  *  WB 

(3-7) 
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e'  *  c1(J  +  nc  (with  probability  Pin  =  1/4)  Cin  =  A,  •  Wfl  +  Bn  •  W 


10  "1  "A  u0  "B 


(3-8) 


c'  =  cn  +  nc  (with  probability  =  1/4)  =  Ai  ‘  +  Bi  *  wb  (3-9) 


Here  c1  in  the  weighted  sum  of  the  inputs  (as  shown  in  Figure  3-2).  The  noise 
terms  are  again  Gaussianly  distributed  with  variance  equal  to  the  sum  of  the 
variances  of  each  of  the  input  noise  densities  scaled  by  the  appropriate  weight. 


The  probability  density  functions  for  the  four  possi!  ilities  given  in 
Equations  (3-6)  through  (3-9)  are  given  in  Equations  (3-10)  through  (3-13). 


f  =  _ L _  e-(c'  -  C00)2/2o2 

Wc  ;  _ _ e 

.flrrr, 2 


(3-10) 


f01( c  * )  = 


1  -(c‘  -  C01)2/2o2 


(3-11) 


^m(c')  ~ 


1  e-(c*  -  C10)2/2a2 


(3-12) 


f ll(C ' )  = 


1  - ( c 1  -  Cll)2/2a2 


(3-13) 


A  graphical  representation  of  these  probability  density  functions  is  shown  in 
Figure  3-3.  Also  shown  in  this  figure  is  an  example  of  the  derivation  of  the 
probability  of  error  for  a  threshold  set  for  "OR"  gate  operation  (the 
corresponding  figure  for  "AND"  gate  operation  is  Figure  3-4).  The  total 
probability  of  error  is  the  sum  of  the  area  under  the  top  curve  to  the  right  of 
the  threshold  multiplied  by  it's  a  pfUofU  probability  (given  in  Equation  (3-6)) 
and  the  areas  of  the  lower  three  curves  multiplied  by  their  corresponding  a 
pfUofU  probabilities.  Expressions  for  the  total  error  probability  are  given  in 
Equations  (3-14)  and  (3-15). 
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Figure  3-3. 

Probability  Density  Functions 
of  Error  Calculation  For  "OR" 

at  Input  to  Threshold  and 
Operation 
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Figure  3-4.  Probability  Density  Functions  at  Input  to  Threshold  and  Probability 
of  Error  Calculation  For  "AND"  Operation 
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Pe  =  P00  ‘  PeOO  +  P01  *  PeOl  +  P10  *  PelO  +  P11  *  Pell 


—  i 

P.  ■  P00  /  Wc'><lc'  +  P01  /  f01(c')dc' 


T  T 

+  P10  /  f10(c‘)dc'  +  P11  /  f n(c ' ) dc ’ 


(3-14) 


(3-15) 


The  problem  of  selecting  a  threshold  then  becomes  one  of  minimizing 
the  expression  given  in  Equation  (3-15).  Mathematically,  this  is  done  by  taking 
the  derivative  of  the  probability  of  error  with  respect  to  the  threshold  value 
and  setting  the  resulting  expression  to  zero.  The  result  of  these  operations  is 
given  in  Equation  (3-16). 


-(T  -  Cm)2/2o2 


-(C01  -  T)2/2o2 


(C10  -  T)2/2o2 


*  P00e 


+  P01e 


+  P10e 


-(Cn  -  T)2/2o2 


.  Pu« 


=  0 


Equation  (3-16)  not  only  yields  a  direct  method  of  selecting  the 
optimum  threshold,  but  also  gives  some  insight  into  the  problem  of  choosing  the 
optimum  weight.  The  expression  for  the  total  probability  of  error  in 
Equation  (3-15),  can  be  minimized  with  respect  to  the  weights  by  minimizing  the 
arguments  of  the  exponentials  in  Equation  (3-16).  It  seems  the  choice  of 
weights  is  rather  arbitrary  since  both  the  noise  and  the  signal  are  weighted 
equally.  However,  since  additional  noise  may  be  introduced  after  the  weighting 
(particularly  in  the  detection  process)  the  weights  should  be  chosen  to  maximize 
the  difference  between  the  mark  and  zero.  Given  that  the  weights  are  passive, 
the  choices  given  in  Equations  (3-17)  and  (3-18)  are  probably  the  best. 
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WA  "  1 


(3-17) 


WB  =  1 


(3-18) 


Equations  (3-17)  and  (3-18)  assume  that  the  input  amplitudes  are 
equal.  Should  this  not  be  the  case,  the  weight  should  be  chosen  so  as  to 
equalize  the  inputs.  For  example,  in  the  case  where  the  b  input  is  of  greater 
amplitude  than  the  a  input,  the  weights  should  be  chosen  as  given  in 
Equations  (3-19)  and  (3-20). 


WA  =  1 


(3-19) 


WB  =  Al/Bl 


(3-20) 


With  the  weights  given  in  Equations  (3-17)  and  (3-18),  the  thresholds 
corresponding  to  "OR"  and  "AND"  operation  are  given  in  Equations  (3-21)  and 
(3-22)  respectively  (scaled  to  unit  amplitude  inputs). 


Tqr  =  1/2  (Assuming  AQ  =  BQ  *  0  and  A^  =  =  1) 


(3-21) 


rAND  s  3^2  (Assuming  AQ  =  BQ  3  0  and  A^  =  Bj  =  1) 


(3-22) 


Coherent  Optical  J-K  Flip-Flop  Analysis. 


As  noted,  the  primary  difference  between  the  noncoherent  and  coherent 
implementations  of  the  optical  J-K  flip-flop,  is  that  the  noncoherent 
implementation  does  not  require  the  use  of  inverted  logic  signals.  Instead  the 
inverted  logic  is  achieved  by  the  use  of  negative  weighting  in  the  weighted 
interconnect  array.  To  achieve  the  negative  weighting,  the  holographic 
interconnect  array  shifts  the  phase  of  one  signal  with  respect  to  a  second  so  as 
to  produce  destructive  interference  at  the  thresholding  gain  array. 
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To  permit  destructive  interference,  it  is  implicit  that  each  of  the 
signals  arriving  at  the  holographic  interconnect  must  maintain  a  fixed  phase 
relationship  with  respect  to  each  other.  The  practical  implications  of  this 
constraint  are  fairly  imposing.  Some  of  these  implications  are  as  follows: 

A.  The  J  and  K  inputs  must  maintain  a  constant  phase  between 
successive  pulses.  In  practice  this  is  not  easy  to  achieve. 

This  means  that  some  forms  of  modulation  of  the  inputs  (e.g. 
direct  modulation)  are  not  suitable.  In  addition,  it  seems  that 
both  inputs  should  be  derived  from  the  same  source.  And  finally, 
the  coherence  time  of  the  source  laser  must  be  much  longer  than 
the  length  of  time  that  any  signal  remains  inside  the  flip-flop. 

B.  The  lengths  of  the  feedback  paths  must  be  accurately  controlled 
to  fractions  of  a  wavelength.  In  order  to  provide  this  accuracy 
the  flip-flop  must  be  immune  to  any  environmental  sources  of 
vibration  and/or  temperature  dependent  path  length  errors. 

Again,  these  problems  are  difficult  to  eliminate. 

C.  The  thresholding  gain  mechanism  must  not  only  have  the  desirable 
thresholding  characteristics,  but  must  also  be  able  to  both 
detect  the  relative  phase  of  two  signals  and  produce  an  output 
signal  with  a  fixed  phase  relative  to  one  of  the  two  signals.  In 
practice,  there  are  devices  capable  of  detecting  the  relative 
phase  (interference)  of  two  signals  (e.g.  square  law  detector); 
and  there  are  devices  capable  of  replicating  the  phase  of  an 
input  signal  (e.g.  laser  amplifier);  but  there  is  no  obvious 
simple  mechanism  for  performing  both  these  functions  in  addition 
to  thresholding. 
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These  are  interesting  problems  for  research  and  development;  and  in 
order  to  proceed  with  the  analysis,  we  will  assume  that  they  can  be  resolved. 
Thus,  we  will  assume  that  the  operation  of  the  coherent  optical  J-K  flip-flop  is 
not  corrupted  by  any  systematic  errors,  but  only  by  random  noise  components 
similar  to  those  discussed  in  reference  to  the  noncoherent  case. 

Tne  simplified  model  of  the  weighted  interconnect  array  and 
thresholding  gain  medium  for  the  coherent  case  is  shown  in  Figure  3-5.  The 
differences  between  this  model  and  that  of  the  noncoherent  case  (Figure  3-1) 
are:  first,  the  thresholding  gain  mechanism  does  not  output  an  inverted  signal; 
and  second,  the  signals  as  well  as  the  weights  are  complex  (i.e.  they  possess 
both  magnitude  and  phase). 

For  convenience,  we  will  represent  the  signals  as  phasors  as  shown  in 
Equations  (3-23)  and  (3-24). 

a  =  A  +  n. 


i tot  ie, 

|  A  |  e  e 


(3-23) 


nA  =  'nAl  e 


b  =  B  =  n. 


1u)t  ie. 

I B I  e  e 


(3-24) 


In  this  case,  the  signal  phasors  (A  and  B)  can  be  modulated  by  either  phase  or 
amplitude;  however,  for  this  analysis,  we  will  assume  that  the  signals  are 
modulated  using  an  on-off  modulation  with  equiprobable  states.  In  addition,  we 


0632A/E3 


5-41 


PS-ER-5652-C I 


3,2 _ --  Continued. 

will  assume  that  the  carrier  frequencies  are  perfectly  matched,  and  thus  neglect 
the  time  variation  for  the  remainder  of  the  analysis.  Finally,  we  will  assume 
that  the  real  and  imaginary  parts  of  the  noise  terms  can  be  represented  as 
independent  zero  mean  Gaussianly  random  variables;  or  equivalently  that  the 
magnitude  of  the  noise  term  is  Rayleigh  distributed  and  the  phase  angle  is 
linearly  distributed  between  0  and  2(ir). 

As  stated,  the  weights  for  the  coherent  case  are  complex  as  given  in 
Equations  (3-25)  and  (3-26). 


(3-25) 


(3-26) 


The  effects  of  weighting  on  the  input  signals  are  shown  using  the  phasor 
representation  for  example  in  Figure  3-6.  This  figure  shows  the  effects  on  an 
arbitrary  signal  with  noise,  of  a  weighting  of  unit  magnitude  and  a  180°  phase 
shift.  The  results  of  the  weighting  of  the  input  signals  are  given  in 
Equations  (3-27)  and  (3-28). 


A'  +  nA' 


ie I  l(e.  +  6.) 

I  A'  |  e  fl  =  [Al  |VIA|  e  A  A 


(3-27) 


.  .  1  a®  ,  ,,  ,  i(*A  +  V 

l"Al  e  =  I°aI  lWAl  e 
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Figure  3-6.  Phasor  Representation  of  Complex  Weighting  of  Input  Signal 


This  figure  shows  the  effects  of  complex  weighting 
(magnitude  and  phase)  on  an  arbitrary  input  signal  with  a 
noise  component.  In  this  example,  the  holographic  weighting 
shifts  the  phase  angles  by  180°  and  scales  the  magnitudes  by 
uni ty. 
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Figure  3-7.  Phasor  Representation  of  the  Sum  of  Two  Arbitrary  Signals  With 
Noise  Components 


0632A/E3 


5-46 


PS-ER-5652-C1 


—  Continued. 


"b'  =  XB  +  1  YB 


Xg  =  |  ng '  |  cos  4>B ' 


(3-32) 


YB  =  IV1  sin  V 


Using  these  transformations,  the  vector  sum  can  be  expressed  as  given  in 
Equation  (3-33). 


c'  =  Xc  +  xc  +  i(Yc  +  yc> 


Xc  =  XA+XB 


(3-33) 


xc  3  XA+XB 


Yc  *  YA+YB 


yc  =  yA+yB 


Here,  the  real  and  imaginary  components  of  the  noise  (xc  and  yc)  are  still  zero 
mean,  Gaussian  random  variable  with  variance  equal  to  the  sum  of  the  component 
variances  (weighted  by  magnitude  of  the  corresponding  weight). 


The  magnitude  of  the  sum  is  given  by  Equation  (3-34) 


.  |2  . 


C'  =  C'  •  c 


'  .  r  '* 


lc'!2  ■  (Xc  ♦  cc)2  ♦  (Vc  +yc)2 


(3-34) 


The  magnitude  squared  given  in  Equation  (3-34)  is  a  random  variable  with  second 
order  non-central  chi-squared  density  as  given  in  the  density  function, 

Equation  (3-35). 


0632A/E3 


3.2 


Continued. 


2  ,  -( I c ‘ } 2  -  CZ)/2a2 

f( |c' |  )  =  — 2  e  (3-35) 

2a 

2  2 

In  this  equation  (o  )  is  the  noise  variance  and  C  is  the  mean  square  value,  as 
a  function  of  the  weights  and  input  amplitudes.  A  graphical  example  of 
chi-squared  density  function  is  given  in  Figure  3-8. 


E 


At  this  point  expressions  for  the  probability  of  error  and  optimum 
threshold  can  be  drived  analogous  to  those  for  the  noncoherent  case  given  in 
Equations  (3-16)  and  (3-17).  But  rather,  preferred  weights  and  thresholds  will 
be  given  based  on  more  general  arguments  similar  to  those  used  at  the  end  of 
Section  3.1. 


The  optical  J-K  flip-flop  performs  three  logic  operations.  The 
equivalent  logic  element,  truth  table,  optimum  weight  selection  and  optimum 
threshold  selection  are  shown  for  each  of  the  three  cases  in  Figures  3-9  through 
3-11.  The  optimum  weights  were  selected  by  maximizing  the  separation  between 
the  peaks  of  the  density  functions;  and  the  optimum  thresholds  were  selected  by 
minimizing  the  shaded  areas  under  the  density  functions. 

An  important  observation  to  be  made  is  that  the  errors  are  much  less 
probable  for  the  "AND"  gate  shown  in  Figure  3-11  than  for  either  of  the  other 
gates  shown  in  Figures  3-9  and  3-10.  Two  other  "preferred"  gates  are  shown  in 
Figures  3-12  and  3-13.  In  general,  use  of  combinations  of  the  "preferred"  gates 
shown  in  Figures  3-11  through  3-13  will  help  minimize  errors.  For  the  J-K 
flip-flop,  however,  exclusive  use  of  these  gates  would  require  many  more  gates 
for  the  same  operation. 
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d.  Density  Functions  and  Threshold  Selection 


Figure  3-9.  Optimum  Weights  and  Threshold  For  X  Logic  Operation 
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d.  Density  Functions  and  Threshold  Selection 
Figure  3-13.  Optimum  Weights  and  Threshold  For  "AND"  Logic  Operation 
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Introduction 


This  report  covers  work  accomplished  on  contract  #RI-39898  during  the 
time  period  July  1,  1985  through  December  31,  1985.  In  fact,  due  to  the  late 
arrival  of  the  contract,  it  was  not  possible  to  engage  graduate  students  in  the 
work  until  the  beginning  of  Fall  quarter,  October  1985,  and  therefore  the 
work  reported  took  place  primarily  in  the  time  period  1  October  1985  until 
the  time  of  this  writing,  15  December  1985.  Work  is  now  well  under  way, 
and  a  no-cost  extension  has  been  requested  to  make  up  for  the  lack  of  avail¬ 
able  personnel  during  the  Summer  months. 

Our  purpose  here  is  to  summarize  the  technical  progress  on  this  contract. 
We  do  so  in  three  parts.  First,  a  general  discussion  of  the  motivation  behind 
the  use  of  optics  a  an  interconnect  medium  is  presented.  Following  is  a  dis¬ 
cussion  of  the  advantages  of  dynamic  or  time-changeable  interconnections 
from  the  point-of-view  of  computation  and  computer  architecture.  Thirdly, 
we  present  a  description  of  several  architectures  we  have  conceived  of  during 
the  course  of  this  work,  and  one  in  particular  that  appears  to  have  much 
merit  for  further  consideration.  Also  included  in  this  section  will  be  a 
description  of  an  experiment  that  is  now  under  preparation  in  our  labora¬ 
tories.  Lastly  we  present  some  administrative  statistics  pertinent  to  the  con¬ 
tract. 

Why  Optics  for  Interconnections? 

A  growing  interest  in  the  potential  advantages  of  optics  as  an  intercon¬ 
nect  medium  at  various  levels  of  computer  architecture  is  now  evident. 
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Already  we  see  optics  penetrating  the  problem  of  machine-to-machine  inter¬ 
connections,  with  the  commercial  advent  of  fiber-optic  local  area  networks. 
The  next  logical  step  is  penetration  into  the  next  lower  layer  of  architecture, 
namely  the  problem  of  module-to-module  communication  within  a  single 
machine.  In  this  context  a  module  may  represent  a  separate  processor,  a 
memory  unit,  or  a  fast  peripheral  device.  The  modem  trends  towards  mul¬ 
tiprocessor  architectures  are  placing  higher  and  higher  demands  on  the  com¬ 
munication  capabilities  within  such  machines. 

Interest  in  optics  for  solving  such  communication  problems  stems  from 
several  sources,  but  most  important  is  the  relative  freedom  of  streams  of 
photons  from  interference  with  one  another.  While  two  streams  of  electrons 
in  close  proximity  inevitably  influence  one  another,  generating  crosstalk,  no 
such  mutual  influence  is  exhibited  by  light  waves.  Indeed,  it  is  even  possible 
to  pas  two  beams  of  light  through  one  another  without  any  measurable 
mutual  interaction. 

Other  advantages  for  optics  can  also  be  cited,  such  as  lower  required 
drive  power  than  electronic  connections  of  comparable  performance,  but  it 
can  be  shown  that  these  advantages  are  generally  dependent  on  the  issue  of 
mutual  interference  as  well.  For  example,  arbitrarily  low  cross  talk  between 
electronic  interconnections  can  in  principle  be  achieved  if  sufficient  shielding 
is  used,  but  such  shielding  increases  the  capacitance  associated  with  the  inter¬ 
connection,  thereby  increasing  the  drive  power  required  for  the  communica¬ 
tion  link. 
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A  major  goal  of  the  work  performed  under  this  contract  is  the  discovery 
of  novel  means  for  achieving  dynamic  optical  interconnections.  By  "dynamic" 
interconnections  we  mean  interconnections  that  can  be  rapidly  changed,  while 
still  supporting  high-speed  communications.  The  realization  of  such  tech* 
niques  opens  up  the  possibility  of  constructing  computers  having  architectures 
that  can  be  altered  at  will,  with  the  changes  taking  place  at  rather  high 
speeds.  The  vision  is  one  of  an  optically  reconfigurable  computer,  the  wiring 
of  which  can  be  changed  rapidly  to  meet  the  needs  of  the  particular  problem 
at  hand.  In  the  section  that  follows,  we  elaborate  on  the  needs  for  reconfi¬ 
gurable  architectures  in  computing. 

Why  Reconfigurable  Architectures? 

Modem  trends  in  computer  architecture  emphasize  increased  computing 
power  through  parallel  processing  structures.  It  is  recognized,  however,  that 
efficient  computation  on  a  parallel  architecture  requires  a  matching  of  that 
architecture  to  the  structure  of  the  problem  at  hand.  Such  a  matching  of  the 
available  computational  resources  to  the  structure  of  the  problem  requires  a 
dynamically  reconfigurable  architecture,  unless  the  computer  is  special- 
purpose  and  will  always  be  used  for  problems  of  exactly  the  same  structure. 
The  need  for  dynamically  reconfigurable  interconnection  networks  has  been 
emphasized  in  a  recent  review  article  in  the  computer  literature  (S.  Yalaman- 
chili  and  J.K.  Aggarwal,  "Reconfiguration  strategies  for  parallel  architec¬ 
tures",  IEEE  Computer,  Vol.  18,  No.  12,  pp.44-61,  December  1985). 

Some  appreciation  for  the  need  for  reconfigurable  interconnect  networks 
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can  be  reached  by  considering  some  very  specific  computational  problems. 
We  present  three  such  problems  here,  merely  as  examples  of  a  much  wider 
range  of  applications.  The  first  two  examples  are  taken  from  the  work  of 
W.D.  Hillis,  the  chief  architect  of  the  computer  known  as  the  "connection 
machine"  (see  W.D.  Hillis,  The  Connection  Machine,  MIT  Press,  Cambridge, 
MA,  1985),  currently  under  development  by  Thinking  Machines,  Inc.,  under 
DARPA  support.  The  connection  machine  itself  is  a  collection  of  a  very 
large  number  (e.g.  64,000)  of  small  and  simple  processors,  all  of  which  are 
connected  in  the  most  simple  fashion  via  nearest  neighbors.  However,  a  vir¬ 
tual  interconnect  network  can  be  set  up,  in  which  messages  are  passed 
between  processors  in  such  a  way  as  to  simulate  any  desired  architecture. 
However,  the  price  paid  for  the  fact  that  physical  connections  are  only  to 
nearest  neighbors  is  an  increased  time  for  interprocessor  communication,  due 
to  the  time  taken  to  pass  messages  from  sources  to  destinations.  Thus  for 
any  of  the  problems  to  be  discussed  in  what  follows,  there  is  a  great  advan¬ 
tage  in  terms  of  computational  speed  if  the  interconnections  can  be  esta¬ 
blished  physically  rather  than  virtually.  Hence  the  motivation  for  a  dynami¬ 
cally  reconfigurable  optical  interconnection  network. 

The  first  example  is  drawn  from  the  field  of  VLSI  simulation.  In  the 
design  of  VLSI  circuits,  there  is  a  strong  dependence  on  computation  as  a 
simulation  tool,  to  predict  the  performance  of  such  circuits  before  they  are 
actually  fabricated.  Design  flaws  can  be  detected  at  the  simulation  stage,  and 
corrected  before  the  expensive  fabrication  steps  are  undertaken.  Such  simula¬ 
tions  are  computationally  intensive.  Their  speed  can  be  increased  by 


employing  a  multitude  of  processors,  each  working  on  a  portion  of  the  simu- 
lation  problem.  However,  the  interactions  between  these  various  portions  of 
the  VLSI  circuit  are  important,  and  therefore  it  is  essential  that  the  connec¬ 
tions  between  processors  incorporate  the  constraints  inherent  in  the  actual  cir¬ 
cuit  interconnections.  At  an  extreme,  a  single  processor  can  be  used  to  simu¬ 
late  the  performance  of  each  transistor  in  the  design,  and  the  interconnections 
between  processors  can  mimic  the  interconnections  between  the  transistors  in 
the  circuit  under  simulation.  Obviously  if  the  multiprocessor  computer  is  to 
be  used  to  simulate  a  variety  of  circuits,  some  means  must  be  available  for 
changing  the  interconnections  between  processors  in  such  a  way  as  to  reflect 
the  various  interconnections  present  in  different  integrated  circuits.  Thus 
the  architecture  must  be  reconfigurable,  although  the  rate  at  which  reconfi¬ 
guration  must  take  place  is  far  smaller  than  the  rates  at  which  communica¬ 
tions  take  place  between  processors.  Figure  1  illustrates  the  close  ties  between 
the  VLSI  architecture  being  simulated  and  the  computer  architecture  used  for 
the  simulations. 

A  second  example  is  drawn  from  the  field  of  artificial  intelligence,  and  in 
particular  the  problem  of  searching  semantic  networks.  A  semantic  network 
is  a  labeled  and  directed  graph  in  which  each  vertex  represents  a  concept,  and 
each  edge  represents  a  relation  between  concepts.  Thus,  for  example,  "apple" 
is  a  concept,  and  both  "fruit"  and  "computer"  are  concepts.  Apple  is  con¬ 
nected  to  fruit  and  to  computer  through  a  relation  named  "is-a".  Thus  the 
the  term  "apple”  refers  to  either  a  fruit  or  a  computer.  Similarly,  the  concept 
"red"  can  be  linked  to  the  concept  "apple"  through  the  relation  "color",  but 


no  such  link  need  exist  between  "computer"  and  "red"  (unless  one  has  a  red 
computer). 

The  problem  to  be  attacked  is  the  deduction  of  knowledge  from  such  a 
semantic  network.  To  discover  specific  knowledge  it  is  necessary  to  search 
for  relations  between  concepts  of  interest.  One  approach  to  this  problem  is 
to  assign  a  single  processor  to  each  concept,  and  to  interconnect  those  proces¬ 
sors  in  a  topology  reflecting  the  specific  semantic  network  under  considera¬ 
tion.  Obviously,  each  different  semantic  network  to  be  searched  requires  a 
different  interconnection  topology.  Hence  the  need  for  reconfigurable  inter¬ 
connections.  Again  the  rates  required  for  reconfiguration  are  much  slower 
than  the  communication  rates  needed  in  a  specific  configuration.  Figure  2 
illustrates  the  parallel  between  the  structure  of  a  semantic  network  and  the 
structure  of  the  processor  architecture  applied  to  it. 

The  last  example  is  drawn  from  the  area  of  neural  networks  and  comput¬ 
ing  (see,  for  example,  J.J.  Hopfield  and  D.W.  Tank,  "Neural  computations  of 
decisions  in  optimization  problems",  Bio.  Cybem  -  to  appear).  There  is 
currently  much  interest  in  the  use  of  neural  networks  for  solving  problems  of 
high  computational  complexity.  Such  networks,  in  their  most  common  form, 
consist  of  various  elements  as  follows:  an  array  of  neurons,  which  are  ele¬ 
ments  that  accept  a  sum  of  many  analog  inputs  but  have  allowable  states  of 
only  0  or  1;  a  complex  interconnection  network  between  those  neurons,  with 
neuron  i  connected  to  neuron  j  with  a  weight  7y.  In  optical  versions  of  such 
networks,  the  basis  of  realization  is  through  a  parallel  matrix-vector  multi¬ 
plier,  in  which  the  transmission  of  the  ijth  matrix  element  represents  Ty,  and 
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Figure  1.  VLSI  simulation  with  multiple  processors 

nonlinear  elements  are  included  in  parallel  feedback  loops  to  force  the  neural 
states  to  be  binary  (see  Figure  3  and  N.H.  Farhat,  D.  Psaltis,  A.  Prata  and  E. 
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Figure  2.  Searching  a  semantic  network  with  a  parallel  computer  architecture 
p.1469,  1985). 

The  problem  to  be  solved  by  the  neural  network  determines  the  intercon¬ 
nections  required  of  the  matrix.  Thus  for  each  new  problem  to  be  solved,  a 
different  interconnect  structure  is  needed.  Once  again  we  see  the  need  for 
dynamically  reconfigurable  interconnections,  and  once  again  the  rate  required 
for  reconfiguration  is  slow  compared  with  the  rates  of  information  transmis¬ 
sion  through  the  interconnections. 
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With  the  above  three  examples  we  hope  we  have  convinced  the  reader  of 
the  need  for  reconfigurable  communication  networks  in  modern  computer 
architecture.  The  remainder  of  this  report  discusses  several  novel  approaches 
to  realizing  such  networks  using  optics. 

Some  Approaches  to  the  Dynamic  Optical  Interconnect  Problem 

In  moving  towards  a  decision  as  to  the  most  promising  optical  approach 
to  pursue,  several  ideas  are  worth  discussion.  First,  in  considering  the 
ground  rules  to  be  used,  two  assumptions  were  adopted.  First,  the  intercon¬ 
nect  problem  to  be  attacked  should  be  at  a  high  level  of  architecture,  such  as 
the  connection  of  processors  to  processors  within  a  single  machine.  Thus 
problems  of  board-to-board  and  chip-to-chip  communication  were  not  expli¬ 
citly  considered.  Secondly,  we  assumed,  for  the  sake  of  simplicity,  that  the 
processors  are  laid  out  in  a  plane.  This  is  by  no  means  an  essential  assump¬ 
tion,  for  indeed  the  solutions  to  be  discussed  are  applicable  to  non-planar 
geometries.  However,  the  planar  assumption  is  adopted  in  all  of  the  illustra¬ 
tions  to  be  presented. 

With  the  above  constraints  in  mind,  we  felt  that  the  generic  form  of  an 
ideal  solution  might  be  as  shown  in  Figure  4.  A  reflective  interconnect  cle¬ 
ment,  to  be  specified  in  more  detail  later,  resides  above  the  piocessor  plane. 
Into  that  reflective  element  is  written  information  that  establishes  a  pattern  of 
reflectance,  such  that  a  source  residing  on  one  processor  and  illuminating  the 
interconnect  element,  generates  one  or  more  reflected  beams  that  are  directed 
towards  detectors  on  certain  other  selected  processors.  The  optical  beams 
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Interconnect  element 


Figure  4.  Ideal  interconnect  configuration. 

so-directed  are  modulated  at  high  rates  and  convey  information  from  the  pro¬ 
cessor  with  the  active  source  to  the  processors  with  the  active  detectors. 
Presumably  every  processor  contains  both  an  optical  source  and  a  detector. 
It  would  be  highly  desirable  if  the  same  sources  used  for  transmitting  infor¬ 
mation  could  also  be  used  for  optically  writing  the  desired  reflectance  pattern 
into  the  interconnect  element.  Thus  if  processor  A  needs  to  communicate 
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with  processor  B,  both  processors  illuminate  the  interconnect  element  with 
their  respective  sources.  This  simultaneous  activation  should  be  capable  of 
writing  an  interconnect  pattern  suitable  for  establishing  the  desired  communi¬ 
cation  link.  There  are  many  practical  problems  associated  with  such  a 
scheme,  but  as  an  ideal,  it  appears  to  us  to  be  worthy  of  consideration.  Some 
of  the  practical  problems  will  be  discussed  later  in  this  report. 

To  move  towards  slightly  more  specific  geometries,  we  consider  the 
situation  illustrated  in  Figure  5. 


Processor  Array  Processor  Array 

Figure  5.  Transmissive  interconnect  element 

In  this  case  we  have  two  processor  arrays,  one  on  the  left  and  one  on  the 
right.  Each  processor  in  each  array  is  assumed  to  have  an  associated  source 
and  a  detector  in  close  proximity  to  that  source.  The  various  sources  are 
assumed  to  be  mutually  coherent,  a  property  that  would  require  that  the 
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signals  transmitted  by  the  sources  actually  originate  with  a  single  source,  with 
distribution  to  all  processors,  probably  via  optical  fibers.  External  modula¬ 
tors  at  the  processors  then  serve  as  the  effective  sources.  To  establish  a  com¬ 
munication  path  between  processor  A  on  the  left-hand  array  and  processor  B 
on  the  right-hand  array,  processor  A  on  the  left  illuminates  the  interconnect 
element  simultaneously  with  processor  A',  also  located  on  the  left  but  at  the 
position  corresponding  the  the  position  of  processor  B  on  the  right.  The  two 
waves  from  the  left  interfere  in  the  interconnect  element,  creating  a  transmis¬ 
sion  hologram.  Further  illumination  of  the  interconnect  element  with  light 
from  processor  A  will  result  in  a  transmitted  diffracted  beam  that  passes 
from  the  interconnect  element  to  processor  B  on. the  right.  Presumably  the 
optical  powers  used  in  the  writing  phase  are  higher  than  the  optical  powers 
used  in  the  communicating  phase. 

Unfortunately,  the  geometry  discussed  so  far  requires  two  processor 
arrays,  one  on  the  left  and  one  on  the  right.  We  seek  solutions  that  allow  all 
processor  to  lie  in  a  single  plane.  For  two  processors  in  the  same  plane  to 
communicate  with  each  other,  it  is  necessary  that  a  reflective  holographic  ele¬ 
ment  (rather  than  a  transmissive  one)  be  generated.  Unfortunately,  in  order 
to  generate  a  reflective  element  with  the  right  properties,  a  diverging  spheri¬ 
cal  wave  from  a  processor  on  one  side  of  the  interconnect  element  must  inter¬ 
fere  with  a  converging  spherical  wave  comings  from  the  opposite  side  of  that 
element.  A  geometry  f  jr  accomplishing  this  goal  is  shown  in  Figure  6.  In 
this  case,  all  the  processors  lie  in  a  single  plane,  but  sources  exist  on  both 
sides  of  the  interconnect  element.  The  optics  is  assumed  to  generate  the 
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Sources  on  processors 


Figure  6.  Geometry  for  writing  a  reflective  interconnect  element. 

requisite  diverging  and  converging  waves  for  recording  the  reflective  focusing 
element.  The  disadvantage  of  this  geometry  is  that  sources  on  both  sides  of 
/  the  element  must  be  properly  controlled  in  order  to  generate  the  needed 

reflective  element.  A  much  preferred  solution  would  allow  all  sources  to 
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exist  on  one  side  of  the  element,  and  indeed  to  be  the  same  sources  used  dur¬ 
ing  the  communication  phase. 

In  the  section  that  follows,  we  discuss  in  more  detail  a  solution  to  the 
above  problem.  All  sources  can  exist  on  the  same  side  of  the  interconnect 
element,  but  the  interconnect  element  is  required  to  be  somewhat  more  com¬ 
plex.  It  is  this  solution  that  we  feel  has  greatest  merit  for  future  investiga¬ 
tions. 

Dynamic  Interconnect  using  Phase  Conjugation 

The  approach  described  in  the  preceding  paragraph  involving  lenses  and 
a  procedure  for  writing  ,  sequentially,  mutually  coherent  pairs  from  opposite 
sides  of  the  interconnect  element  appears  to  have  a  plane  of  symmetry  cen¬ 
tered  around  the  interconnect  element.  In  other  words,  it  is  necessary  to 
duplicate  the  processor  array  pattern  in  order  to  be  able  to  establish  the  inter¬ 
connect  pattern  desired.  Another  approach  therefore  suggests  itself,  one  in 

c 

which  this  symmetry  of  the  optical  configuration  is  exploited  by  using  optical 
phase  conjugation.  It  would,  at  least  in  principal,  be  much  more  attractive 
from  a  layout  and  implementation  point  of  view  if  we  could  use  the  processor 
array  pattern  itself  to  establish  and  program  the  interconnect  pattern.  A  pos¬ 
sible  configuration  is  shown  in  Figure  7. 

Every  processor  has  associated  with  it  a  source  and  receiver  and  located 
above  the  processor  plane  is  a  nonlinear  crystal  to  be  used  for  establishing  the 
interconnect  pattern  and  a  phase  conjugate  reflector.  The  procedure  is  now 
as  follows.  Suppose  that  processor  A  wishes  to  communicate  with  B,  the 


Vs 


Processor  Array 


V 

% 

Figure  7.  Phase  conjugation  geometry 

? 

receiver.  To  establish  the  connection  B  sends  out  a  spherical  wave  incident 
on  the  nonlinear  interconnect  crystal  and  the  phase  conjugate  mirror.  Simul- 
i^.  taneously  two  counter  propagating  pump  beams  are  also  incident  on  the 

phase  conjugator  and  a  phase  conjugate  or  backwards  propagating  beam 
towards  processor  B  results.  As  soon  as  a  phase  conjugate  grating  has  been 
v  formed  B  is  turned  off.  The  phase  conjugate  mirror  has  memory  associated 

k  „ 

with  it  in  that  a  phase  conjugate  beam  remains  even  after  the  source  from  B 
has  stopped  radiating.  Subsequently  the  source  from  A  is  lit  and  both  the 
converging  and  diverging  spherical  waves  needed  to  establish  the  interconnect 
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pattern  are  incident  on  the  nonlinear  crystal.  A  grating  is  formed  such  that 
spherical  waves  comming  from  processor  A  are  directed  towards  B.  After 
the  interconnect  pattern  has  been  established  communication  between  the  two 
processors  may  commence.  The  nonlinear  crystal  also  has  memory,  i.e.  the 
grating  persists  after  the  writing  beams  are  turned  off.  However,  during 
communication  the  grating  may  be  gradually  erased  depending  on  the  inten¬ 
sity  of  the  communication  waves.  To  establish  a  new  pattern  the  previous 
one  is  erased  by  illuminating  it  with  a  uniform  source  and  repeating  the  pro¬ 
cedure. 

The  principle  of  operation  just  described  needs  to  be  investigated  in 
detail  in  order  to  be  able  to  answer  important  questions  regarding  the  perfor¬ 
mance  of  this  device.  During  the  writing  and  establishment  of  the  phase  con¬ 
jugate  beam  the  nonlinear  crystal  is  exposed  as  well.  In  fact  when  both  the 
source  from  B  and  its  phase  conjugate  are  present  a  reflection  hologram  is  set 
up  in  the  nonlinear  crystal.  This  is  undesirable  and  can  be  avoided  by  several 
means.  The  sensitivity  of  the  nonlinear  interconnect  element  is  much  less 
than  that  of  the  reflector  and  consequently  during  the  grating  formation  time 
in  the  reflector  the  interconnect  crystal  is  not  affected.  Alternatively,  we 
may  use  gain  in  the  phase  conjugate  reflector  during  recording  of  the  inter¬ 
connect  pattern  and  a  much  less  sensitive  interconnect  crystal  and  thus  avoid 
the  formation  of  unwanted  gratings.  Signal  amplification  may  be  achieved  by 
varying  the  intensity  of  the  readout  beam  for  the  reflector. 

The  issue  of  grating  storage  in  the  nonlinear  medium  also  needs  to  be 
addressed.  Currently  used  materials  such  as  BSO  and  BGO  have  only  very 
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limited  storage  capacity;  depending  on  the  intensity  of  the  beams  used  in 
recording  and  readout  the  memory  time  may  be  on  the  order  of  milliseconds. 
In  ferro  electrics  on  the  other  hand  the  grating  can  be  fixed  but  the  sensitivity 
is  much  smaller  than  for  B SO  and  BGO.  In  addition  these  crystals  are  not 
very  sensitive  in  the  near  infrared  regime  compatible  with  semiconductor 
sources.  We  are  currently  investigating  these  issues  related  to  the  nonlinear 
recording  materials  and  their  impact  on  the  architecture  described.  It  is  not 
our  intention  to  perform  materials  research,  but  it  is  necessary  to  study  the 
impact  of  materials  parameters  on  the  performance  of  the  device. 

To  address  the  feasibility  aspect  of  the  interconnect  system  as  currently 
envisioned,  we  have  planned  several  experiments.  First  we  have  prepared  an 
experiment  involving  the  optical  architecture  described  with  the  exception 
that  instead  of  nonlinear  recording  materials  film,  is  used.  The  phase  conju¬ 
gate  mirror  is  replaced  by  sensitive  silver-halide  holographic  film  and  the 
interconnect  element  will  be  made  of  relatively  insensitive  film.  The  pro¬ 
cedure  for  establishing  the  interconnect  pattern  is  similar  to  that  described, 
except  that  the  recording  is,  of  course,  no  longer  dynamic  in  nature  and 
chemical  processing  is  needed  the  to  develop  the  film.  The  issues  we  are  par¬ 
ticularly  interested  in  investigating  are  the  effect  of  spurious  reflections  on 
the  interconnect  performance  and  the  feasibility  of  the  approach  in  terms  of 
establishing  the  communication  pattern.  For  instance  in  practice  it  may  be 
necessary  to  place  the  sources  and  receivers  of  each  processor  at  slightly  dif¬ 
ferent  locations.  Therefore,  the  resolution  of  the  interconnect  grating  should 
be  such  that  some  degree  of  aberration  occurs  in  order  to  receive  light  at  the 


desired  location.  It  may  in  fact  be  possible  to  use  the  same  element  on  the 
processor  for  both  the  source  and  the  receiver,  in  which  case  this  issue  disap¬ 
pears.  Currently  we  have  designed  the  optical  architecture  for  this  experi¬ 
ment  and  the  necessary  optical  components  are  being  procured.  We  hope  to 
carry  out  this  experiment  during  the  first  quarter  of  1986. 

Upon  completion  of  the  preliminary  experiment  involving  film  we  intend 
to  replace  the  optical  interconnect  element  with  a  nonlinear  crystal  such  as 
BSO  or  BGO.  In  particular  we  are  interested  in  investigating  the  effect  of 
spurious  gratings  in  this  element  due  to  the  establishment  of  the  grating  in 
the  phase  conjugator.  In  particular  we  may  be  able  to  take  advantage  of  the 
nonisotropic  behavior  of  these  crystals.  For  instance  gratings  with  a  grating 
vector  aligned  parallel  to  the  direction  of  an  applied  field  are  more  efficient 
than  those  in  other  directions.  Therefore  we  may  be  able  to  selectively  favor 
reflection  gratings  over  transmission  gratings  which  are  formed  due  to  spuri¬ 
ous  reflections. 

Administrative  Matters 

The  personnel  involved  in  this  investigation  have  been  Prof.  J.W.  Good¬ 
man  and  Prof.  L.  Hesselink,  Principal  Investigators,  and  two  graduate  stu¬ 
dent  research  assistants.  Two  oral  presentations  were  made  to  the  Optical 
Computing  Consortium  Review  Panel,  one  in  August  in  San  Diego,  and  the 
second  in  October  in  Washington,  DC.  As  of  yet,  no  other  technical  papers 
have  been  presented,  either  orally  or  in  writing. 
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Abstract 


Research  on  thresholding  operations  in  optical  computing 
was  carried  out.  Specific  tasks  Included  (1)  investigations  of 
all-optical  threshold  elements  and  networks,  (2)  theoretical  and 
experimental  work  on  holographic  weighting  for  threshold  systems, 
(3)  experimental  and  computer  simulation  studies  of  adaptive 
matched  spatial  filtering,  and  (4)  initial  investigations  of 
coherence  theory  in  optical  computing.  A  summary  of  this 
research  is  followed  by  ten  papers  that  describe  significant 
experimental,  analytical,  and  computer  simulation  results. 


TECHNICAL  SUMMARY 


1 .  OBJECTIVES 

Objectives  of  the  research  at  the  University  of  Dayton 
Research  Institute  (UDRI)  were  to  investigate  thresholding  opera¬ 
tions  in  optical  computing  in  the  broadest  and  most  fundamental 
manner  consistent  with  the  successful  identification  and  develop¬ 
ment  of  certain  research  breakthroughs.  These  breakthroughs  are 
widely  believed  to  be  necessary  to  realize  the  potential  of  opti¬ 
cal  computing  for  multiple-order-of-magnitude  improvements  in 
speed,  power  consumption,  size,  reliability,  etc.,  compared  to 
current  and  projected  all-electronic  computing  technology. 

2.  DESCRIPTION  OP  WORK  PERFORMED  AND  RESULTS 

Summaries  of  research  in  the  four  major  ta&3r  areas,  with  the 
lead  principal  investigators  Indicated,  are  given  below.  More 
detailed  descriptions  are  given  in  the  ten  papers  that  comprise  the 
Technical  Discussion. 

(a)  All-optical  Threshold  Elements  and  Networks  (D.  L. 

Flannery,  S.  C.  Gustafson,  L.  M.  Vail). 

The  potential  of  optics-based  technology  for  performing 
the  basic  decision  and  interconnection  operations  required 
in  any  data  processing  system  was  analyzed.  Optical 
threshold  logic  designs  for  elementary  register- 
level  operations  were  developed,  including  2-  and  8-bit 
multlply-add  designs  and  designs  for  signed-digit  arith¬ 
metic.  Circulating  packet  and  lock-and-clock  architectures 
suitable  for  current  and  projected  bistable  optical  devices 
were  also  Identified.  Finally,  electro-optically  imple¬ 
mented  Grossberg  neural  network  models  for  adaptive 
pattern  recognition  were  considered. 


7-5 


(b)  Theoretical  and  Experimental  Work  on  Holographic 
Weighting  for  Threshold  Systems  (S.  C.  Gustafson, 

G.  R.  Little). 

Optical  processing  systems  characterized  by 
thresholding  operations  concentrated  at  one  functional 
location  were  analyzed.  A  complete  design  for  a  lumped 
threshold  2-bit  multiplier  was  developed.  Holographic 
implementations  of  the  required  weighting  operations  were 
identified  for  the  coherent  source,  complex  weight  case.  A 
four-input-bit  experimental  effort  was  carried  out  for  this 
case  that  included  an  Improved  phase  stabilization/control 
scheme.  The  possible  use  of  optical  phase  conjugation  in 
systems  with  holographically  implemented  weighting  or 
interconnection  operations  was  also  assessed. 

(c)  Experimental  and  Computer  Simulation  Studies  of 
Adaptive  Matched  Spectral  Filtering  (D.  L.  Flannery, 

J.  S.  Loomis,  L.  M.  Vail). 

Theoretical  and  experimental  work  was  carried  out  on  a 
laboratory  correlator  that  uses  binary  magneto-optic  spa¬ 
tial  light  modulators  for  both  the  image  input  and  a  real-time 
programmable  binary  phase-only  filter.  An  adaptive  matched 
spatial  filtering  concept  based  on  rapidly  sequenced 
filters  was  defined  as  having  high  potential  for  real¬ 
time  image  processing  in  recognition,  discrimination,  and 
tracking  tasks.  Image  coding  using  pseudorandom  shift 
register  sequences,  which  may  be  valuable  in  the  deter¬ 
mination  of  optimum  correlator  aperture  patterns,  was  also 
assessed. 

(d)  Initial  Investigations  of  Coherence  Theory  in  Optical 
Computing  (S.  C.  Cartwright). 

A  consideration  of  real-time  nonlinear  optical  pro¬ 
cessors  led  to  the  realization  that  optimum  optical  pro- 


cessing  system  designs  or  design  tradeoffs  might  be 
Identified  through  the  comprehensive  application  of 
coherence  theory.  The  cross-spectral  density  function  was 
determined  to  be  suitable  for  this  purpose,  particularly  if 
generalized  gratings  or  arrays  of  optical  bistable  devices 
(mutually  incoherent  if  they  exhibit  gain)  are  included  in 
the  system. 

3.  CONCLUSIONS  AND  RECOMMENDATIONS 

Research  in  the  areas  of  all-optical  threshold  elements  and 
networks  and  phase  conjugation  and  other  nonlinear  techniques  for 
holographic  thresholding  and  weighting  will  provide  the  best  oppor¬ 
tunities  for  long-range  developments  of  fundamental  importance.  In 
contrast,  it  is  anticipated  that  research  on  adaptive  matched  spa¬ 
tial  filtering  and  on  the  application  of  coherence  theory  will  pro¬ 
vide  the  best  opportunities  for  short-range  developments  of  more 
immediate  practical  importance. 

Two  specific  recommendations  for  long-range  development  are  as 
follows.  First,  the  circulating  packet,  lock-and-clock,  and  related 
all-optical  architectures  should  be  studied  using  simple  "test  case" 
problems,  both  experimentally  and  in  computer  simulations.  Second, 
holographic  systems  for  discrete  pattern  recognition,  table  look-up 
computation  and  related  operations  should  be  investigated  in  the 
same  way.  The  incorporation  of  neural  network  or  associative  memory 
concepts  and  techniques  such  as  phase  conjugation  in  either  of  these 
areas  has  particular  promise  for  achieving  multlple-order-of- 
magnitude  computer  performance  increases. 
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TECHNICAL  DISCUSSION 


Attached  are  ten  papers,  two  published  and  eight  submitted 
or  prepared  for  publication,  that  describe  significant  experimen¬ 
tal,  analytical,  and  computer  simulation  results.  A  majority  of 
the  research  reported  in  each  of  these  papers  was  supported  by 
SDIO/IST  through  ONR  contract  No.  N00014-85-K-0479  and  conducted 
at  the  University  of  Dayton  from  1  June  to  31  December  1985.  The 
papers  are  listed  by  their  major  task  area  below. 

1.  All-Optical  Threshold  Elements  and  Networks 

D.  L.  Flannery  and  L.  M.  Vail,  "All-Optical  Decision 
Elements  and  Optical  Computing.” 

S.  C.  Gustafson,  "Thresholding  and  Weighting  in  Optical 
Computing,"  revision  of  book  chapter  submitted  for  publica¬ 
tion  in  Optical  Computing.  Marcel  Dekker,  1986. 

S.  C.  Gustafson,  D.  L.  Flannery,  and  R.  0.  Winder, 

"Analysis  of  Threshold  Logic  for  Applications  to  Optical 
Computing,"  Proc.  SPIE  564,  pp.  173-178,  San  Diego,  CA, 

Aug.  1985. 

L.  M.  Vail  and  D.  L.  Flannery,  "Robust  Tracking  Using 
Electro-Optically  Implemented  Neural  Networks." 

2.  Theoretical  and  Experimental  Work  on  Holographic  Weighting 
for  Threshold  Systems 

S.  C.  Gustafson,  J.  A.  Kirk,  G.  R.  Little,  R.  P.  Kenan,  and 
C.  M.  Verber,  "Optical  Implementations  of  Lumped  Threshold 
Logic,"  Proc.  SPIE  564 .  pp.  157-166,  San  Diego,  CA,  Aug. 
1985. 

G.  R.  Little,  "Phase  Stabilization  and  Control  Technique 
with  Improved  Precision,"  to  be  submitted  to  Applied  Optics. 


* 


£ 


7-8 


v.vvi  --sm  !■&  W} 


G.  R.  Little,  "Holographic  Weighting  and  Phase  Conjugation 
for  External  Thresholding  Architectures." 

Experimental  and  Computer  Simulation  Studies  of  Adaptive 
Matched  Spatial  Filtering 

0.  L.  Flannery  and  J.  S.  Loomis,  "Adaptive  Matched 
Filtering." 

S.  C.  Gustafson,  "Image  Coding  Using  Pseudorandom  Shift 
Register  Sequences,"  to  be  published  Proc.  SPIE  514,  paper 
no.  46,  presented  Cannes,  France,  Dec.  1985. 

Initial  Investigations  of  Coherence  Theory  in  Optical 
Computing 

S.  C.  Cartwright,  "Note  on  Coherence  Theory  and  Optical 
Processing  Systems." 


ALL-OPTICAL  DECISION  ELEMENTS 
AND  OPTICAL  COMPUTING 


David  L.  Flannery,  S.  C.  Gustafson, and  L.  Maugh  Vail 
University  of  Dayton  Research  Institute 
Dayton,  Ohio  45469 


Abstract 


Some  aspects  of  the  design  of  optical  computers  using  all 
optical  decision  elements  are  reviewed.  In  particular,  the  cir 
culating  packet  and  lock-and-clock  architectures  are  described 
and  related  to  current  and  projected  bistable  optical  devices. 
As  an  example,  threshold  logic  implementations  of  signed-digit 
arithmetic  are  considered. 
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Bistable  Optical  Devices  and  Transphasors 


Bistable  optical  devices  (BOD)  and  the  closely  related 
Transphasor  devices  are  the  subjects  of  considerable  current  and 
planned  device  research  and  development  interest.  The  recently 


published  book,  Optical  Bistability:  Controlling  Light  with 
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Light ,  by  H.  Gibbs  (1985)  comprises  a  comprehensive  source  of 
both  the  theory  and  recent  experimental  accomplishments  in  this 
area,  although  it  excludes  the  laser  and  laser  amplifier  device 


approaches  which  recently  have  shown  promise.  An  attractive 
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concept  for  the  ultimate  all-optical  computer,  using  optical 
logic  elements  and  capitalizing  on  optical  interconnections,  is 
what  we  prefer  to  call  the  circulating  packet  (CP)  architecture 
shown  in  Figure  1.  This  is  only  one  possible  concept  and  is 
itself  subject  to  many  possible  variations.  The  comparative  anal 
ysis  of  such  candidate  architectures,  strongly  coupled  with  cri¬ 
tical  assessment  of  proposed  device  concepts  and  their  ultimate 
potential  for  CP  computer  implementation,  is  a  critical  need. 


The  CP  architecture  reflects  the  current  strategy  for  devel 
oping  an  all-optical  computer  which  will  be  orders-of-magnitude 
superior  to  competing  electronic  computers  on  the  basis  of  the 
important  figures  of  merit,  which  include  not  only  raw  thru-put 
(operations/second)  but,  probably  of  greater  importance,  specific 
power  consumption  (ops/sec/watt)  and  specific  size  (ops/sec/unit 
volume) .  It  now  appears  that  optical  logic  elements  will  at  best 


equal  electronic  elements  in  the  areas  of  gate  density  and 


switching  speed  and  energy.  Thus,  a  reasonable  strategy  relies 
on  the  inherent  parallelism  and  noninterfering  propagation  of 
optics  to  provide  the  orders-of-magnitude  advantage  through 
superior  interconnections.  Algorithms  and  architectures  which 
capitalize  on  the  Inherent  parallelism  afforded  by  optical  inter¬ 


connects  will  be  an  important  ingredient  in  attaining  optical 
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computing  superiority  for  many  problem  types. 


From  the  preceding  discussion,  two  required  areas  of  device 
research  and  development  are  apparent:  optical  interconnects  and 


optical  logic  elements.  The  interconnect  area  is  of  obvious 
paramount  importance  since  it  is  expected  to  yield  the  winning 

5 

advantage  over  electronics;  it  is  reflected  in  work  on  various 
types  of  holographically  interconnected  device  concepts,6  such  as 

7 

multiplier  modules.  The  other  area,  optical  logic  devices,  is 
the  subject  of  this  section.  Optical  input/output  logic  elements 
will  be  critical  to  the  success  of  the  optical  computer;  conver¬ 
sion  between  optical  and  electronic  domains  at  each  logic  element 
will  be  unacceptable  due  to  the  amount  of  device  overhead  and  the 

g 

associated  power  dissipation  involved. 

Gibbs1  ( p . 305 )  has  succinctly  summarized  the  salient  charac¬ 
teristics  of  an  ideal  discrete  BOD: 

•  Small  size  -  characteristic  dimension  of  a  micrometer. 

•  Switching  energy  less  than  1  fJ  (or  holding  power  less  than 
1  mW)  . 

•  Speed  -  switching  time  of  1  ps  or  less. 

•  Room  temperature  operation. 

•  Integratable  to  permit  large  numbers  of  interconnections 
insensitive  to  external  perturbations. 

To  these  can  be  added  other  obvious  considerations  such  as  gain 
(required  in  the  CP  loop),  fan-in  and  fan-out  (cascadability) ,  a 
need  for  an  inverting  logic  element  (NOT) ,  and  input-output  iso¬ 
lation.  Since  a  new  computing  technology  is  to  be  developed,  the 
application  of  threshold  logic  in  optical  computing  is  worthy  of 
investigation,  based  on  its  known  advantages  in  terms  of  gate 

g 

count,  logic  levels,  and  reduced  interconnections.  As  has  been 

3 

pointed  out  by  Armstrong,  an  ultimate  practical  issue  for  any 
logic  element  type,  with  more  stringent  requirements  in  the 
threshold  logic  case,  is  that  of  manufacturability,  i.e.,  pro¬ 
ducing  large  arrays  of  gates  with  sufficiently  small  tolerances 
on  threshold  levels  and  weights.  Considering  the  early  state  of 
development  of  BODs,  a  serious  consideration  of  the  manufac¬ 
turability  question  would  be  premature.  Nevertheless,  such 
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issues  are  worth  keeping  in  mind  as  we  assess  and  analyze  can¬ 
didate  logic  gate  approaches. 

No  BOD  device  approach  has  yet  demonstrated  the  combination 
of  characteristics  stated  above  for  an  ideal  computing  element. 
However,  a  number  of  promising  research  results  on  several 
attractive  device  concepts  have  been  reported  recently  (e.g.,  at 
the  Third  OSA  Optical  Meeting  on  Optical  Bistability,  Tucson, 

3 

Arizona,  2-4  December  1985).  Progress  is  being  made,  and  no 
fundamental  barrier  to  the  desired  performance  ha9  been  disco¬ 
vered.  Devices  having  switching  energies  of  one  pJ  or  holding 
powers  of  several  mW  have  been  demonstrated.  Area-scaled  projec¬ 
tions  based  on  current  devices  indicate  the  potential  for  fJ 
switching  energy  for  micrometer-dimension  devices,  although  such 
scaling  does  not  always  hold.  Picosecond  switching  times  have 
also  been  demonstrated.  The  materials  and  structures  under 
investigation  have  potential  for  integration.  Recent  reports  of 
laser  and  laser  amplifier  BOD  devices  offer  hope  that  the  gain 
requirement  can  be  met.  Thus  it  is  reasonable  to  expect  that, 
given  adequate  research  and  development  effort,  such  devices  will 
be  available  in  the  future. 

Circulating  Packet  Architectures 

BODs  having  picosecond  switching  times  are  required  for  the 
CP  architecture,  and  several  promising  candidates  are  being 
investigated.  These  include  GaAs  etalons,  with  or  without 
multiple  quantum  wells  (MQW) ,  GaAs  Self  Electro-optic  Effects 
Devices  (SEED)1,  CdS  etalon  devices1,  and  InGaAsP/InP  laser 

3 

structures.  This  list  is  not  exhaustive  but  includes  approaches 
currently  showing  greatest  promise.  Other  approaches  are  under 
investigation  and  may  show  greater  promise  in  the  future. 

A  careful  study  and  assessment  of  promising  devices  rela¬ 
tive  to  their  application  in  CP  type  architectures  is  advisable 
both  to  form  tentative  comparative  rankings  and  (probably  of  more 
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practical  value)  to  define  critical  issues  to  both  aid  assessment 
of  current  device  work  and  to  aid  in  planning  future  work.  As  an 
example  of  such  analysis,  consider  the  electrical  connections  and 
power  dissipation  which  would  be  associated  with  dense  arrays  of 
two  promising  device  types:  SEED  devices  and  Laser  devices. 

Both  types  promise  attractive  optical  and  total  (electrical  plus 
optical)  switching  energies,  if  they  can  be  scaled  to  micrometer 
sizes.  Both  type  devices  require  electrical  connections  to  each 
logic  element,  and  actually  dissipate  at  least  80%  of  their 
switching  energies  electrically  as  deposited  thermal  energy. 
However,  the  details  are  totally  different  for  the  two 
approaches.  Using  reasonable  estimates  of  on-chip  connection 
capability,  based  on  electronic  integrated  circuit  technology 
(e.g.  conducter  size,  resistance,  capacitance),  plus  projected 
device  characteristics,  the  power  dissipation,  speed,  and  other 
pertinent  parameters  of  device  arrays  can  be  analyzed.  Such  an 
analysis  should  provide  interesting  information  and  uncover 
issues  which  deserve  further  analysis. 

A  subject  of  legitimate  concern  is  the  performance  limita¬ 
tion  which  might  be  imposed  on  a  CP  computer  by  thermal  dissipa¬ 
tion  in  the  logic  gate  array  plane.  It  is  much  too  early  to 
perform  definitive  analyses  of  this  issue;  however  the  following 
represents  a  reasonable  projection  based  on  currently  known  fac¬ 
tors  . 

Assume : 

25  im  x  25  pm  gate  element  size 
2 

8  Mm  gate  active  area 
2 

10  fJ/Mm  switching  energy  per  unit  area 
100  ps  switching  cycle 
300  ps  clock  period 

50%  gate  duty  cycle  (i.e.  on  average  50%  of  gates  switch 

each  clock  period) 

These  assumptions  lead,  through  simple  and  obvious  calculations, 
to  the  following  performance  numbers: 


5  2 

Gate  density:  1.6  x  10  gates/cm 

14  2 

Throughput:  2.7  x  10  switching  events/cm  /s 

2 

Power  dissipation:  21.3  W/cm 

5 

Photons/Switching  Event:  about  10 

Some  discussion  is  in  order.  The  cell  size  and  gate  area  chosen 
are  believed  to  be  reasonably  conservative.  The  gate  area  is  at 
least  ten  times  the  diffraction  limited  spot  area  for  laser  diode 
wavelengths.  The  cell  area  is  much  larger  than  gate  area,  which 
allows  room  'or  other  connections  or  devices  which  might  be 
required  with  a  given  gate  approach.  This  also  favors  isolation 
of  the  devices  both  thermally  and  electronically.  The  switching 
energy  assumed  is  also  believed  to  be  reasonably  conservative  in 

g 

view  of  recently  reported  results  for  SEED  devices  and  is  an 
order  of  magnitude  above  theoretical  limits  claimed  for  that  type 
device.  The  assumption  of  100  ps  switching  time  is  not  unreason¬ 
able  as  several  current  device  approaches  have  reached  or  are 
approaching  that  speed.  The  300  ps  clock  time  is  based  on  what 
seems  to  be  a  reasonable  CP  loop  size.  It  corresponds  to  a  round 
trip  distance  of  about  nine  cm.  50*  duty  cycle  is  rather 
arbitrarily  chosen.  This  factor  should  be  strongly  dependent  on 
both  the  problem  being  solved  and  the  algorithm  used.  However, 
duty  cycles  exceeding  this  value  seem  unlikely. 

The  resulting  power  dissipation  density  is  not  at  all  pro- 

2 

hibitive.  Values  ranging  from  100  W/cm  to  as  high  as  700 
2 

W/cm  are  achieved  using  forced  liquid  cooling.  It  must  be  noted 
that  the  figure  given  is  a  lower  bound  since  it  includes  only  the 
electrical  and  optical  power  dissipated  in  gate  switching,  and 
other  sources  of  power  dissipation  may  be  associated  with  a  given 
computer  approach.  The  number  of  photons  per  switching  event  is 
sufficient  to  support  threshold  logic  operation  with  1.5* 
threshold  tolerances  with  noise  margins  of  five  times  the  stan¬ 
dard  deviation  of  the  photon  noise. 

The  performance  potential  suggested  by  the  above  projection 
must  be  viewed  as  a  strong  incentive  to  pursue  all-optical  com- 


putation  technology.  Throughput  at  least  on  a  par  with  current 
supercomputers  is  projected  for  each  square  cm  of  gate  array 
plane  area.  The  volume  of  the  CP  optical  package  associated  with 
that  performance  would  be  on  the  order  of  10  cm  .  Of  course  any 
realistic  assessment  must  include  a  volume  and  power  allowance 
for  the  cooling  system,  input/output  interface  hardware,  and 
other  support  requirements.  If  we  assume  that  this  remaining 
hardware  requires  1000  times  the  volume  of  the  core  CP  module, 

3 

the  resulting  total  computer  volume  is  10,010  cm  or  about  0.35 
cubic  feet.  This  is  at  least  two  orders  of  magnitude  below  that 
of  a  supercomputer  such  as  the  Cray. 

Lock-and-Clock  Architectures 

By  lock-and-clock  architectures,  we  refer  to  the  type  of 
architecture  proposed  by  S.D.  Smith  and  co-workers  as  shown  in 
Figure  2. 10,11  The  important  difference  between  this  and  the  CP 
type  architectures  is  the  assumption  that  switching  times,  and 
thus  minimum  logic  pulse  durations,  are  longer  than  the  round 
trip  transit  time  of  the  optical  loop.  This  necessitates  the 
latching  (or  locking)  of  logic  signals  at  certain  planes,  as  well 
as  precautions  to  prevent  the  continuous  feeding  of  a  signal 
around  the  loop  (i.e.,  isolating  planes).  Smith10  has  shown  that 
for  transmission  bistable  latching  array  planes,  these  require¬ 
ments  are  minimally  met  by  using  three  such  planes  in  the  loop 
with  the  appropriate  three  phase  system  clock. 


The  type  of  BOD  envisioned  for  use  in  the  lock-and-clock 
architecture  is  the  ZnSe  interference  filter  (etalon)  device. 
These  devices  are  thermally  switched  and  exhibit  switching  times 
of  10  jis  or  greater.  Smith  and  co-workers  have  published  exten¬ 


sively  on  these  devices  and  proposed  architectures  which  would 
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utilize  them  (e.g.,  Figure  2).  They  have  announced  plans  to 
demonstrate  a  lock-and-clock  computer  within  one  or  two  years.  A 
clever  example  of  another  architecture  to  solve  a  specific 


problem  with  these  devices  has  been  given  by  Wherrett . 


%  •.  .*%  *.  \  A  A  . 
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This  section  outlines  a  suggested  implementation  of  signed¬ 
digit  arithmetic  using  threshold  logic  in  a  CP  all-optical  com¬ 
puter.  The  modified  signed-digit  (MSD)  arithmetic  system,  which 
uses  the  radix-2  subset  of  general  signed-digit  arithmetic,  has 
already  been  shown  to  have  promise  for  optical  implementation  by 
Drake,  Bocker,  Lasher,  Patterson,  and  Miceli.  Their  analysis 
resulted  in  the  definition  of  key  MSD  logic  modules  which  enable 
carry-free  addition  and  subtraction  in  parallel  pipeline  con¬ 
figurations  which  capitalize  on  the  intrinsic  optical  computing 
advantages  of  parallelism,  non-interfering  connections,  and  high 
bandwidth.  In  this  section,  the  realization  of  the  MSD  key  logic 
modules  is  analyzed  using  arrays  of  optical  threshold  logic  ele¬ 
ments  in  a  circulating  packet  optical  computing  architecture. 
Preliminary  results  are  favorable:  the  key  modules  can  be 
realized  with  only  a  few  threshold  logic  elements  each,  and 
proper  interconnection  is  easily  implemented  in  the  circulating 
packet  architecture.  These  results  provide  motivation  to  extend 
the  research  toward  higher  level  functions,  and  to  consider  other 
architectural  variations,  as  well  as  signed-digit  representations 
of  radix  other  than  two. 

The  effort  to  date  has  generated  threshold  logic  implemen¬ 
tations  of  the  four  key  modules  for  MSD  addition  and  subtraction 
in  the  circulating  packet  (CP)  architecture.  The  trinary  digit 
was  encoded  as  two  binary  digits,  and  the  CP  implementation 
involved  a  latency  of  one  clock  cycle  to  achieve  the  two  levels 
of  threshold  logic  involved.  Considerable  modularity  was 
realized  because  a  good  deal  of  commonality  was  achieved  between 
threshold  logic  elements  used  by  the  different  key  modules. 
Figures  3  through  6  show  the  threshold  elements  and  CP  architec¬ 
ture  implementations  for  two  key  modules.  The  module  designs  are 
preliminary  and  additional  effort  investigating  the  possibility 
of  different  digit  encoding  schemes  and  alternate  threshold  logic 
element  choices  is  justified.  The  goal  will  be  to  minimize  both 
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the  number  and  types  of  logic  elements,  the  latter  to  achieve 
greater  modularity. 

The  design  of  threshold  logic  implementations  based  on 
signed-digit  arithmetic  using  radix  other  than  two  and  the  impli¬ 
cations  for  hardware  tolerances  (e.g.  threshold  tolerances)  of 
using  higher  radixes  are  worthy  of  consideration.  Based  on 
reasonably  optimized  key  MSD  logic  module  designs,  the  design  of 
high-level  register-type  devices,  such  as  a  multiply-and-add 
module,  should  be  possible.  This  would  result  in  useful  infor¬ 
mation  concerning  the  viability  of  this  approach  in  comparison  to 
others . 

The  CP  architecture  we  have  considered  is  only  one  of  many 
possible  variations;  other  variations  may  prove  to  be  optimal  for 
some  or  all  applications.  To  review,  the  CP  architecture  we  have 
assumed  involves  a  single  plane  containing  an  array  of  threshold 
logic  elements,  and  a  feedback  path  capable  of  general  intercon¬ 
nections,  i.e.,  between  any  output  of  the  logic  plane  and  any 
input.  We  have  assumed  a  synchronous  or  clocked  system  locked  to 
the  round-trip  transit  time  of  a  data  packet.  No  use  of  pipe¬ 
lining  has  been  considered.  Obvious  variations  worthy  of  con¬ 
sideration  include  two  or  more  cascaded  logic  planes,  storage  of 
multiple  data  packets  in  the  loop,  and  even  multiple  optical  loop 
configurations.  Variations  on  how  control  or  inputs  are  intro¬ 
duced  are  obviously  possible.  Perhaps  the  number  of  dummy  or 
pass-thru  elements  required  to  implement  the  two- level  MSD  key 
modules  in  the  baseline  CP  conf igurat ion  can  be  reduced  or  elimi¬ 
nated  using  the  variations  mentioned.  This  is  an  example  of  Just 
one  factor  of  an  overall  optimization  problem  posed  by  the  CP 
type  architectures.  A  generally  important  and  probably  very  dif¬ 
ficult  task  is  to  define  the  overall  optimization  criteria  for 
the  CP  architectures,  or  essentially,  for  the  all-optical  com- 
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Figure  1.  Circulating  packet  architecture  for  an  all-optical  computer. 
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Figure  2.  Lock-and-clock  architecture  for  all-optical  computer.  v  . 
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Threshold  Logic  Implementation  of  W-module. 
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Abstract 


Theoretical  and  experimental  work  is  reviewed  and  extended 
on  a  laboratory  correlator  that  uses  binary  magnetooptic  spatial 
light  modulators  for  both  image  input  and  real-time  programmable 
binary  phase-only  filtering.  The  corresponding  adaptive  matched 
filtering  concept  appears  to  have  significant  potential  for  real 
time  image  processing  applicable  to  recognition,  discrimination, 
and  tracking  tasks. 


Introduction 


This  paper  builds  on  the  experimental  and  theoretical 
coherent  correlation  capability  reported  in  "Improved  Optical 
Filters  for  Image  Tracking."1  In  that  effort  a  laboratory  corre¬ 
lator  using  binary  magnetooptic  spatial  light  modulators  (SLM) 
for  both  image  input  and  real-time  programmable  binary  phase- 
only  filtering  (BPOF)  was  assembled  and  operated.  Figure  1  is 
diagram  of  the  correlator  optical  system.  The  SLM  devices  were 
Litton  LIGHT-MODsT"6  having  48-by-48  elements.  Results  were  in 
excellent  agreement  with  a  theoretical  model  based  on  the  fast 
Fourier  transform  and  validated  the  excellent  correlation  per¬ 
formance  achievable  with  BPOF  techniques.  Figure  2  shows  initial 
theoretical  and  matching  experimental  results.  The  work  reported 
here  was  motivated  by  the  concept  of  adaptive  correlation 
illustrated  by  the  system  schematic  of  Figure  3.  This  concept 
applies  adaptive  filter  selection  techniques  combined  with 
a  rapidly  programmable  Fourier  plane  filter  (i.e.  the  BPOF 
realized  with  LIGHT-MOD""  devices)  to  yield  a  system  capable  of 
handling  complex  scenarios  (which  may  require  hundreds  or  even 
thousands  of  reference  images)  on  a  real-time  basis.  Efforts 
concentrated  on  achieving  true  real-time  laboratory  performance 
and  defining  requirements  for  a  filter/image  data  bank  to  support 
planned  computer  simulations  of  the  adaptive  correlator  concept. 


Laboratory  Correlator 


The  results  shown  in  Figure  2  were  static  in  the  sense  that, 
although  the  spatial  light  modulator  devices  can  be  cycled  at  TV 
frame  rates,  the  drive  circuitry  and  computer  software  in  use  did 
not  support  the  speed  of  the  devices.  To  load  and  program  a 
single  image  or  filter  pattern  required  about  fifteen  seconds, 
most  of  which  involved  disk  access  and  file  conversion  activi¬ 
ties.  A  major  portion  of  the  current  effort  was  generation  of 
improved  software  for  the  Apple  He  computer  to  speed  up  the 
input  and  filter  programming  process.  As  a  result,  the  capabil- 


ity  is  now  available  to  sequence  both  filter  and  input  images  at 
the  rate  of  approximately  four  frames  per  second,  which  is  now 
limited  by  the  processing  capability  of  the  Apple  computer.  The 
software,  as  modified,  also  allows  at  least  thirty  image  or 
filter  patterns  to  be  stored  simultaneously  in  computer  memory, 
and  allows  the  operator  to  build  complex  sequences  of  image  and 
filter  inputs  with  variable  time  delays.  Another  valuable 
feature  is  a  shift  operation  which  generates  straight  line  motion 
of  an  input  object.  The  combined  effects  of  these  new  capabili¬ 
ties  were  incorporated  in  the  production  of  a  video  tape 
recording  demonstrating  dynamic  correlation  using  the  X-0  input 
array  shown  in  Figure  2  for  both  X  and  0  filters.  Figure  4 
shows  computer  model  results  for  correlation  using  a  binarized 

image  of  the  NASA  space  telescope.  These  results  are  as  expected 

2  —  5 

based  on  previous  work  and  demonstrate  the  much  sharper  corre¬ 
lation  peak  achieved  with  phase-only  correlation. 

Adaptive  Correlator  Concept 


Plans  have  been  developed  for  implementing  a  computer  model 
of  the  adaptive  correlator  concept  shown  in  Figure  3.  Table  1  is 
a  set  of  specifications  generated  as  goals  for  the 
iraage/f liter  memory  bank,  which  is  a  vital  part  of  such  a  model. 
The  bank  consists  of  digital  images  and  corresponding  Fourier 
transforms,  rapidly  accessible,  corresponding  to  parametric 
variations  of  target  object  images.  Such  variations  could 
correspond  to  changes  in  size,  aspect  angle,  and  orientation 
angle  (i.e.,  scale  variation  and  two  degrees  of  freedom  of  angu¬ 
lar  variation).  Ideally,  views  (and  transforms)  of  the  target  for 
all  possible  variations  of  both  azimuth  and  elevation  would  be 
required,  although  a  discrete  set  of  views  covering  every  10°  in 
rotation  and  each  10  percent  change  in  scale  may  be  adequate.  A 
special  case  is  the  overhead  view  (elevation  =  90°)  for  which 
only  one  angular  variable  (rotation,  or  orientation  angle)  is 
required.  Note  that  the  image  bank  model  would  be  used  to  pre- 


sent  sequences  of  filters  (target  object  transforms)  to  the  adap¬ 
tive  matched  filter  correlator  simulation,  and  would  also  provide 
sequences  of  input  scenes  for  the  simulation.  For  input  scenes, 
the  target  image  would  be  superimposed  on  a  background  such  as 
typical  terrain  or  a  mathematically  simulated  random  pattern,  with 
shift  (translation)  as  a  parameter.  (This  is  necessary  to  simu¬ 
late  tracking  of  a  moving  object).  In  addition,  the  following 
features  are  required: 

1.  Option  to  provide  images  and  filters  in  thresholded 
binary  form  (adjustable  threshold  for  images;  binarize  on  sign  of 
real  part  of  filters). 

2.  Ability  to  provide  degraded  resolution  versions  of  images 
and  filters  (e.g.,  averaging  over  2-by-2,  3-by-3,  4-by-4  pixel 
cells)  . 

3.  Ability  to  easily  vary  signal-to-nolse  (i.e.,  target-to- 
background)  intensity  ratio. 

4.  Ability  to  interface  to  correlation  and  dynamics  com¬ 
puter  models  in  computationally  efficient  manner. 

The  adaptive  matched  filtering  concept  (Figure  3)  has 
significant  potential  for  real-time  image  processing  applicable 
to  recognition,  discrimination  and  tracking  tasks.  A  computer 
simulation  incorporating  an  application  scenario  model,  correla¬ 
tor  model,  and  Image/filter  bank,  as  described  above,  is  a  key 
research  step  in  the  assessment  of  this  concept. 
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Table  1.  Suggested  specifications  for  an  image  bank  model 


Parameter 


Minimum 

Implementation 


Ideal 

Implementation 


Image  size 
(pixels ) 

128-X-128 

1024-X-1024 

Grey  scale 

6  bits 

>_  8  bits 

Number  of  different 
target  objects 

2 

Many 

Number  of  different 
backgrounds 

2 

Many 

Target  range  (Scale) 
variation 

5  : 1 

10%  steps 

20:1 

5%  steps 

Target  angular 
variations 

Rotation  only 

10°  steps 

Rotation,  3°  steps 
Aspect  (elevation) 

Figure  4.  Computer  model  results  for  correlation  using  an 
binarized  image  of  the  NASA  space  telescope, 

(a)  binary  image,  (b)  true  anticorrelation, 

(c)  BPOF  correlation,  no  input  noise,  (d)  BPOF 
correlation,  50%  input  noise. 
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Abstract 


The  potential  of  optics-based  technology  for  performing  the 
basic  decision  and  interconnection  operations  required  in  any  data 
processing  system  is  reviewed.  Examples  in  which  only  lnterconnec 
tlon  operations  are  performed  optically  and  in  which  both  inter¬ 
connection  and  decision  operations  are  performed  optically  are 
discussed.  A  general  optical  computing  design  that  summarizes  key 
issues  is  also  presented. 


INTRODUCTION 


1 . 


This  paper  reviews  the  potential  of  optics-based 
technology  for  performing  the  basic  decision  and  Interconnection 
operations  required  In  any  data  processing  system.  Section  1 
considers  motivations  and  definitions.  Section  2  considers 
examples  in  which  only  interconnection  operations  are  performed 
optically,  Section  3  considers  examples  in  which  both 
Interconnection  and  decision  operations  are  performed  optically, 
and  Section  4  considers  a  general  optical  computing  design  that 
summarizes  key  Issues. 

Since  optical  computing  is  a  relatively  new  concept, 
it  is  appropriate  to  consider  architecture  and  programming 
techniques  that  have  not  thus  far  proved  widely  useful  in 
all-electronic  computers,  such  as  residue  arithmetic,  multi¬ 
level  logic,  and  threshold  logic.  In  general,  research  on  such 
nonstandard  techniques  has  not  progressed  to  the  point  where 
more  than  tentative  assessments  of  their  value  in  optical  com¬ 
puting  can  be  made.  This  statement  also  applies  to  the  opti¬ 
cal  computing  thresholding  and  weighting  operations  defined  in 
Section  1.2.  Thus  this  paper  is  necessarily  brief  and 
speculative,  and  all  examples  are  simple  and  Intended  mainly  to 
motivate  future  research. 

l.l  Potential  of  Optical  Computing 

Optical  computing  has  significant  potential  for  multlple- 
order-of-magnltude  performance  improvements  in  areas  such  as 
speed,  power  consumption,  size,  memory,  reliability,  fault- 
tolerance,  etc.,  compared  to  current  all-electronic  computing. 
The  fundamental  advantage  of  optics  (characterized  by  electro¬ 
magnetic  frequencies  of  perhaps  1014  Hz)  compared  to  electro¬ 
nics  (characterized  by  electromagnetic  frequencies  of  perhaps 
107  Hz)  is  the  ability  of  optics  to  make  relatively  numerous, 
complex,  long  distance  (global)  and  high-bandwidth  intercon- 


nections  [  1 , 2 ] .  Such  optical  interconnections  are  generally 
(a)  non-interfering;  l.e.,  optical  beams  can  Intersect  without 
interference  but  electronic  "wires"  cannot,  and  (b)  rela¬ 
tively  free  of  transmission  line  effects  that  can  lead  to 
significant  time  delays  and  corresponding  performance 
constraints.  Other  possible  advantages  of  optics  are  massive 
parallelism,  high  logic  element  speed,  and  easy  three- 
dimensional  design.  These  advantages  are  more  accessible  to 
all-electronic  technology  and  are  thus  less  fundamental  than 
the  interconnection  advantage.  In  general,  optics-based  tech¬ 
nology  has  the  potential  to  excel  in  performing  interconnection 
operations  and  to  at  least  equal  all-electronic  technology  in 
performing  decision  operations.  This  distinction  may  be 
attributed  to  the  fact  that  the  material  nonlinearities  required 
for  decision  operations  are  much  more  pronounced  at  the 
electromagnetic  frequencies  and  intensities  typical  of 
all-electronic  technology. 

The  potential  of  optical  computing  may  be  appreciated  by 
considering,  for  example,  a  system  consisting  of  a  bistable 
optical  device  array  in  an  all-optical  feedback  loop.  Here 
the  bistable  devices  perform  decision  operations  and  the  feed¬ 
back  loop  (which  may  contain  holograms,  lenses,  beamsplitters, 
and  other  optical  components)  performs  interconnection  opera¬ 
tions.  As  indicated  In  Section  4,  reasonable  projections  of 
current  technology  may  permit  this  system  to  perform  on  the 
order  of  10ia  logical  operations  per  second  per  square  cen¬ 
timeter,  or  about  100  times  higher  than  current  VHSIC  program 
goals  [3] .  A  present  limitation  on  the  performance  of  this 
system  Is  the  power  dissipation  of  the  optical  bistable  device 
array;  this  limitation  is  an  Indication  of  the  relative  dif¬ 
ficulty  of  using  optics  for  decision  operations.  However, 
this  difficulty  should  not  preclude  the  development  of  optics- 
based  computers  with  phenomenal  performance  compared  to  all- 
electronic  computers.  For  example,  numerous  and  complex 


interconnections,  at  which  optics  excels,  are  perhaps  the  most 
Important  feature  of  biological  neural  network  architectures 
in  which  each  logic  element  (or  neuron)  may  be  connected  to  on 
the  order  of  10*  other  elements  (4,5j.  Although  the  indivi¬ 
dual  neuron  switching  or  decision  time  is  long  compared  to 
typical  electronic  gate  switching  times,  these  biological 
architectures  are  well  known  to  have  exceptional  performance 
characteristics,  e.g.,  highly  adaptive  image  recognition  using 
small  size  and  low-power-consumption  "wetware". 

1.2  Definition  of  Weighting  and  Thresholding 

Computer  architectures  and  operations  may  be  represented 
by  a  set  of  interconnected  elements  that  make  decisions.  For 
example,  any  Boolean  logic  function  (or  truth  table)  can  be 
Implemented  by  a  network  of  basic  decision  or  logic  elements 
such  as  AND  and  OR  gates.  In  general,  interconnections  may  be 
characterized  by  weights  that  describe  the  connection 
strengths,  e.g.,  0  or  1  (for  off  or  on)  or  any  real  or  complex 
number.  Decisions  may  be  characterized  by  inequality  rela¬ 
tionships  relative  to  a  set  of  threshold  values,  which  may 
also  be  real  or  complex  numbers.  Thus  interconnecting  may  be 
associated  with  weighting  and  deciding  may  be  associated  with 
thresholding. 

The  functional  location  in  space  or  time  of  the  weighting 
and  thresholding  operations  can  be  useful  in  classifying  opti¬ 
cal  computing  architectures  [e] .  It  is  appropriate  to  base 
such  classification  on  characteristics  of  the  decision  or 
thresholding  operations,  because  the  most  essential  archltec- 


tural  differences  are  generally  asso<~A.*ted  with  the  most  dif¬ 
ficult  operations.  Accordingly,  a  fr-et  category  encompasses 
external-thresholding  architectures,  v.htre  thresholding  opera¬ 
tions  are  performed  on  the  non-optical  side  of  one  or  more 
optical-electronic  interfaces  and  where  weighting  operations 
are  performed  by  optical  elements  such  as  holograms,  lenses, 
etc.  For  example,  the  now-classic  Stanford  matrix-vector 
multiplier  [7]  (used  to  perform  discrete  linear  transforms, 
e.g.,  discrete  Fourier  transforms)  generally  employs  a 
multiple-aperture  mask  and  passive  optics  to  Implement 
Interconnections  and  an  electronically  thresholded  photode¬ 
tector  array  to  implement  decisions.  A  second  category  encom¬ 
passes  Internal  thresholding  architectures,  where  at  least 
some  thresholding  operations  are  all-optical  and  where,  con¬ 
sequently,  nonlinear  devices  with  optical  inputs  and  outputs 
are  required.  Thus  internal-thresholding  architectures  con¬ 
tain  all-optical  nonlinear  devices  whereas  external- 
thresholding  architectures  do  not,  and  the  classification 
separates  decision  operations  performed  optically  from  those 
performed  electronically.  This  classification  may  be  applied 
to  digital  or  analog  designs  or  to  unclocked  or  clocked  (e.g., 
combinational,  asynchronous-sequential,  or  synchronous- 
sequential)  architectures  at  levels  of  sophistication  ranging 
from  single  gates  to  complete  processors  (e.g. ,  gate, 
register,  and  processor  levels)  [8] .  Note  that  architectures 
in  both  classes  generally  make  essential  use  (through  lenses, 
holograms,  etc.)  of  the  non-interfering  interconnection  capa¬ 
bility  of  optics. 

1.3  Standard  and  Non-Standard  Threshold  Logic 

Threshold  logic  is  a  subject  closely  related  to 
thresholding  and  weighting  operations  in  computing.  It  was  an 
active  area  of  research  in  the  1960's  and  early  1970's  [ 9— 1 1 ] , 
but  since  then  and  until  very  recently  [12]  attracted  little 
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attention,  largely  because  conventional  Boolean  logic  became 
standard  in  all-electronic  integrated  circuit  design. 

Standard  or  linear  Inequality  thresholded  logic  is 
illustrated  in  Figure  1.  Here  the  logic  function  y  *  X1X2X3X4 
+  XJX3  (for  which  a  truth  table  is  given)  is  implemented  by  a 
threshold  logic  element,  where  xlfx2,  X3,  and  X4  are  binary 
inputs,  y  is  a  binary  output,  the  plus  sign  is  an  OR  opera¬ 
tion,  the  implied  products  are  AND  operations,  and  the  bar  is 
a  NOT  operation.  The  element  multiplies  each  binary  input  by 
a  real-valued  weight  (w1.w2.w3,  or  W4) ,  sums  the  results,  and 
compares  with  a  real-valued  threshold  T.  If  the  sum  is  less 
than  the  threshold  the  output  is  a  zero;  otherwise  it  is  a 
one.  Note  from  the  example  that  a  single  threshold  element 
implements  a  logic  function  that  would  require  several  conven¬ 
tional  Boolean  gates  and  two  levels  of  logic  (three  if  the  NOT 
operation  is  included) .  Note  also  that  the  threshold  element 
will  operate  properly  if  the  threshold  is  any  value  satisfying 
4  <  T  4  5  and  that  non-zero  weight  tolerances  may  be  specified 
if  the  threshold  tolerance  is  further  restricted. 

In  general,  linear-inequality  thresholded  logic  elements 
have  several  binary  inputs,  one  binary  output,  and  an  analog 
Internal  mechanism  that  may  be  appropriate  for  optical  imple¬ 
mentations  (some  element  designs  allow  discrete  internal 
mechanisms  [ 1 3 ]  and  multiple  binary  outputs  [14]).  As  Figure 
1  illustrates,  proper  operation  of  threshold  elements  can 
often  be  made  insensitive  to  fluctuations  in  nominal  weight 
and  threshold  values  which  may  occur  due  to  environmental 
conditions,  fabrication  variations,  etc.  Note  from  the  truth 
table  in  Figure  lb  that  appropriate  weight  and  threshold  values 
may  be  obtained  by  solving  a  set  of  simultaneous  linear 
inequalities.  For  example,  the  twelfth  row  of  the  truth  table 
requires 


Wjxj  +  W3X3  +  W4X4  >_  T 


(1) 


0 
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Since  there  are  generally  more  inequalities  than  unknowns  (in 
the  example,  16  Inequalities  or  truth  table  rows  in  the  5 
unknowns  w1,w2,W3,W4,  and  T)  solutions  will  not  always  exist. 
However,  solutions  that  exist  and  that  maximize  various 
threshold  or  weight  tolerances  can  be  obtained  using  linear 
programming  techniques.  Much  early  work  in  threshold  logic 
focused  on  finding  such  solutions  or  their  characteristics 
without  explicitly  using  linear  programming,  which  can  be  com¬ 
putationally  intensive.  Advances  in  computer  hardware  and 
algorithms,  however,  have  made  the  use  of  linear  programming 
techniques  [ 1 5 ]  for  the  design  of  threshold  logic  elements  and 
networks  much  more  attractive. 

As  indicated  above,  not  all  logic  functions  are  reali¬ 
zable  using  single  linear-inequality  threshold  logic  elements; 
those  that  are  realizable  are  called  threshold  or  linearly 
separable  functions.  In  general,  there  are  22  logic  func¬ 
tions  of  n  binary  variables  (because  each  of  2n  input  truth 
table  rows  can  have  either  binary  output) ,  but  the  number  of 
threshold  functions  is  usually  much  smaller — an  upper  bound  is 
(2n  +i)/n!  For  example,  if  n  *  3  the  total  number  of  func¬ 
tions  is  256,  the  above  upper  bound  is  170,  and  the  actual 
number  of  threshold  functions  is  104  [  1 0 ] .  For  n=2  (the 
simplest  case)  it  may  be  easily  shown  that  14  of  the  16 
possible  two- input  Boolean  logic  gates,  including  AND  and  OR, 
are  realizable  using  single  threshold  elements,  and  thus 
linear-inequality  threshold  logic  may  be  viewed  as  a  generali¬ 
zation  of  ordinary  Boolean  logic.  Since  any  combinational 
(constant  truth  table)  logic  function  may  be  realized  using  a 
network  of  gates  or  elements  with  no  more  than  two  levels  of 
Boolean  logic  (l.e.,  with  input-output  paths  passing  through 
no  more  than  two  logic  gates  in  series,  not  considering  the 
NOT  operation) ,  it  follows  that  the  same  is  true  for  threshold 
logic.  However,  Boolean  logic  networks  for  complex  functions 
(e.g.,  16-bit  multiplication)  generally  require  more  than  two 


7-48 


logic  levels  to  avoid  interconnecting  irapractically  large  num¬ 
bers  of  logic  elements  on  the  same  level  [  1 6 ] .  Threshold 
logic  and,  in  particular,  optical  implementations  of  threshold 
logic  may  mitigate  such  requirements.  This  characterization 
and  the  example  of  Figure  1  indicate  that  threshold  logic  has 
the  potential  advantages  of  fewer  logic  levels  (which  implies 
greater  speed) ,  fewer  logic  elements  (which  implies  less  power 
consumption) ,  and  fewer  interconnections  (which  implies  lower 
complexity) . 

These  potential  advantages  will  usually  be  more  pro¬ 
nounced  for  the  more  general  nonstandard  or  nonlinear- 
inequality  threshold  logics.  An  element  in  such  logic 
compares  a  threshold  value  with  a  nonlinear  function  of  the 
binary  inputs,  such  as  a  quadratic  polynomial  where  the  coef¬ 
ficients  are  considered  to  be  weights  [  1 7 ]  .  As  discussed  in 
Section  2,  coherent  optical  systems  can  Implement,  for 
example,  a  nonlinear  function  proportional  to  the  squared 
magnitude  of  the  sum  of  the  binary  Inputs  multiplied  by 
complex  weights,  where  the  weights  represent  optical  wave 
amplitudes  and  phases.  In  this  case  it  has  been  shown,  using 
an  exhaustive  but  limited-resolution  numerical  search,  that 
the  number  of  logic  functions  of  n  =  3  binary  variables  using 
one  threshold  element  with  complex  weights  and  either  inverted 
or  non- inverted  output  is  at  least  246  [l8] .  This  number, 
when  compared  with  the  104  such  functions  that  can  be  imple¬ 
mented  using  standard  threshold  logic.  Indicates  the  increased 
"logic  power"  of  one  nonstandard  threshold  logic.  Some  neural 
network  architectures  for  optical  computing  [4,5,19,20]  may  be 
described  in  terms  of  such  general  threshold  logics  or  may 
have  enhanced  performance  if  they  are  employed.  For  example, 
if  the  number  of  patterns  of  n  binary  pixels  to  be  classified 
into  one  of  two  categories  is  less  than  2n,  then  well-known 
adaptive  methods  based  on  the  Widrow-Hoff  algorithm  or  on  per- 
ceptron  architectures  can  usually  Identify  a  linearly 


separable  function  or  single  standard  threshold  element  that 
performs  the  classification  [10] .  However,  if  single  non¬ 
linear  threshold  elements  (of  the  sort  that  may  have  effective 
optical  implementations)  are  allowed,  then  the  proper  classi¬ 
fication  of  significantly  more  than  2n  patterns  may  be 
possible  [21] . 

2.  EXTERNAL-THRESHOLDING  DESIGNS 

In  external- thresholding  designs  optical  techniques  are 
used  only  to  implement  interconnection  or  weighting  opera¬ 
tions;  decision  or  thresholding  operations  are  performed 
electronically  (e.g.,  by  photodetectors  with  thresholded 
amplifiers).  The  weighting  operations  are  generally  performed 
by  some  combination  of  diffracting,  refracting,  and  reflecting 
elements  in  bulk  or  Integrated  optical  implementations. 
Generalized  diffraction  gratings  or  holograms  are  basic  com¬ 
ponents  in  models  that  use  content  addressable  memory  concepts 
and  holographic  table  look-up  techniques  to  generate  desired 
weighting  patterns  [22].  Sections  2.1  and  2.2  consider  two 
simple  examples  of  these  models:  a  design  for  a  two-bit 
multiplier  and  a  design  that  realizes  any  logic  function  of 
two  variables. 

2.1  Two-Bit  Multiplier  Example 

A  truth  table  for  the  multiplication  of  two  two-bit  num¬ 
bers  x*xg  and  Y1Y0  to  obtain  z^2zl'z0  is  given  in  Figure  2. 
Suppose  that  the  four  input  bits  are  represented  by  0  if  they 
are  zero  and  by  x^  =  A*  exp(i$i),  xg  =  A2  exp(i$2).  Yi  =  A3 
exp(l$3),  and  yg  =  A4  exp (1^4)  if  they  are  ones.  If  these 
expressions  correspond  to  waves  in  an  approximation  where  all 
source-source,  source-detector,  and  detector-detector  distances 
are  large  compared  to  the  wavelength,  then  the  two-bit  multiplier 
may  be  designed  as  shown  in  Figure  3a  [6].  Here  xq,  xj,  yg,  and 
Yl  are  optical  point  sources,  and  zg,  z\,  zz>  and  Z3  are  point 


photodetectors.  The  lines  indicate  optical  paths,  each  of  which 
may  have  a  selected  attenuation  and  phase  shift  that  could  be 
implemented  by  a  hologram  or  by  integrated  optical  diffracting 
elements . 

The  required  attenuations  and  phase  shifts  may  be  obtained 
by  solving  sets  of  simultaneous  nonlinear  inequalities  derived 
from  the  truth  table.  For  example,  the  12th  row  and  the  z2 
column  of  the  table  (boxed  in  Figure  2a)  imply  a  signal  at 
photodetector  z2  that  must  equal  or  exceed  a  threshold  T2 : 

I A 2  exp(i$i)  +  a3  exp(i$3)  +  A4  exp(i$4)|2  >  T2  (2) 

Similar  expressions  may  be  obtained  so  that  each  of  the  four 
output  columns  in  the  table  (labeled  Z3,  z2,  zj,  and  zq)  is 
described  by  a  set  of  16  (one  for  each  table  row)  simultaneous 
nonlinear  inequalities  in  9  unknowns:  four  amplitudes  (A}, 

A2,  A3,  and  A4),  four  phases  ($i,  $2 ,  <1)3,  and  <fr4),  and  one 
threshold  (Tj,  T2 ,  T3 ,  or  T4).  Solutions  are  required  for 
each  of  these  four  overdetermined  inequality  sets  (which 
Involve  terms  such  as  A^,  2AiA2cos  ( i~ «*>2  )  ,  etc.)  such  that 
the  amplitudes,  phases,  and  thresholds  obtained  all  have 
acceptable  tolerances  or  ranges  over  which  they  may  vary 
without  affecting  proper  two-bit  multiplier  operation. 

One  solution  which  involves  only  phase  shifts  (no  attenu¬ 
ations)  and  which  may  have  practical  tolerances  is  given  in 
Figure  3b  [6] ,  where  the  +  column  gives  the  phase  shifts 
required  for  each  of  the  four  paths  (in  order)  to  each  detec¬ 
tor,  the  T  column  gives  the  threshold  value  for  each  detector 
when  Aj  3  A2  *  A3  *  A4  *  1,  and  the  AT/R  column  gives  the 
fraction  of  the  total  signal  range  on  each  detector  over  which 
its  threshold  may  vary.  Figure  4  is  a  histogram  of  AT/R  for 
output  z2  generated  by  selecting  each  of  the  four  phases  for 
this  output  randomly  from  normal  distributions  with  means  at 
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their  design  values  and  standard  deviations  equal  to  arctan(.i). 
(.1).  These  standard  deviations  correspond  to  10 %  displace¬ 
ments  of  the  ase  vectors,  and  Figure  4  shows  that  such 
variations  reduce  the  threshold  tolerance  AT/R  for  output  z2 
from  37%  to  about  20%.  Similar  acceptance  tolerances  may  be 
obtained  for  the  other  outputs. 

The  two-bit  multiplier  design  described  above  is  based  on 
the  ability  of  optics  to  provide  noninterfering  interconnec¬ 
tions  which  (a)  are  parallel  in  that  interconnection  time  is 
essentially  independent  of  interconnection  length  or  weight 
and  (b)  lead  to  system  operation  times  essentially  limited 
only  by  the  response  times  of  sources  or  detectors.  These 
interconnections  may  be  provided  by  passive  diffracting  ele¬ 
ments  in  the  form  of  one  or  more  ordinary  or  bulk  thin  or 
thick  holograms  in  which  light,  coherent,  or  possibly  non¬ 
coherent  (white)  [23] ,  propagates  approximately  normal  to  the 
hologram  plane.  These  interconnections  may  also  be  provided 
in  integrated  optical  implementations  by  passive  diffracting 
elements  formed  on  or  near  a  substrate  surface  such  that  light 
propagates  approximately  parallel  to  the  surface.  Such 
integrated  optical  implementations  could  use  surface  relief  or 
photoref ractive  mechanisms  to  form  the  diffracting  elements  on 
GaAs,  LiNb03,  glass  or  other  substrates.  These  implemen¬ 
tation,  have  potential  for  realizing  significant  size,  power 
consumption,  and  reliability  advantages  and  for  providing 
real-time  programmable  interconnections  or  weightings  using 
electronically  modulated  diffracting  element  structures 
[6,24]  . 

Optically  generated  holograms  can  Implement  the 
weightings  required  for  certain  truth  tables  or  input-output 
relationships  (including  the  two-bit  multiplier)  in  external- 
thresholding  systems.  Using  standard  models  of  the 
holographic  process  [25]  it  may  be  shown  [6]  that  an  LxP  out¬ 
put  truth  table  matrix  A  is  related  to  an  NxP  input  truth 
table  matrix  C  by 
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A 


(3) 


=  OR+C  , 

where  0  and  R  are  LxM  and  NxM  matrices  describing  the  complex 
amplitudes  used  in  recording  an  M-fold  exposed  hologram  and  + 
is  the  conjugate  transpose  operation.  An  important  aspect  of 
Equation  (3)  is  that  although  many  exposures  may  be  used  to 
record  the  hologram,  the  ability  of  the  hologram  to  represent 
input-output  relationships  is  described  by  no  more  than  the  NL 
complex  elements  of  0R+.  In  the  two-bit  multiplier,  for 
example,  where  N  =  L  =  4  and  P  *  16,  only  16  complex  parame¬ 
ters  are  available  to  relate  64  input  bits  to  64  output  bits. 

This  suggests  that  not  all  possible  truth  tables  are  reali¬ 
zable  in  an  optically  recorded  hologram.  An  analogous 
situation  (discussed  in  Section  1.3)  is  that  not  all  logic 
functions  can  be  implemented  by  single  threshold  logic  ele¬ 
ments  . 

It  would  be  useful  to  solve  Equation  (3)  at  least  approximately 
for  0R+  in  terms  of  C  and  A.  This  matrix  equation  is 
generally  over-determined,  and  least-squares  or  pseudoinverse 
methods  might  be  used  to  obtain  an  approximate  solution.  The 
(row  by  row)  least-squares  solution,  for  example,  is 

0R+  *  AC+(CC+)-1  .  (4) 

While  this  solution  may  not  yield  the  desired  truth  table 
realization  in  an  external- thresholding  system,  it  may  serve 
as  a  starting  point  for  a  steepest  descent  or  other  computer 
search  for  desired  solutions.  Such  solutions  should  maintain 
the  desired  input-output  relationship  when  the  matrix  elements 
are  varied  over  acceptable  tolerance  ranges.  The  optics 
phase-only  solution  for  the  external  thresholding  2-bit 
multiplier  described  in  Figure  3b  is  such  a  solution  and  may  be 
used  to  derive  an  0R+  matrix  in  which  all  elements  have 
unit  magnitude  [ 6 ] : 
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A  particular  implementation  of  this  solution  for  holographic 
recording  is  0  *  the  4x4  identity  matrix  and  R  =  (0R+)+. 

Note  that  although  the  above  analysis  implies  three- 
dimensional  holographic  systems,  integrated  optical  assemblies 
of  diffracting  elements  similar  in  function  to  bulk  holograms 
might  be  used.  This  possibility  is  related  to  the  observation 
that  the  multiple  truth  table  "images"  to  be  recorded  and 
reconstructed,  although  often  highly  cross-correlated,  may  be 
relatively  simple  or  low-resolution  bright-spot  dark-spot  pat¬ 
terns  . 

2.2  Two-Binary-Variable  Example 

Optical  or  computer  generated  holographic  synthesis  of 
weighting  or  interconnecting  operations  will  generally  require 
knowledge  of  the  diffracting  patterns  on  the  hologram  that 
modify  amplitudes  and  phases  to  yield  the  correct  truth  table 
input-output  behavior  with  maximum  weight  and  threshold 
tolerances.  In  the  case  of  the  geometrical  optics  two-bit 
multiplier  design  of  Figure  3,  expressions  governing  input- 
output  behavior  were  easily  obtained.  This  favorable 
situation  may  be  uncommon  in  the  design  of  the  generally 
smaller,  more  efficient,  etc.,  external  thresholding  systems 
for  which  geometrical  optics  approximations  do  not  apply. 

Consider,  for  example,  the  derivation  of  eight  far-field 
holograms,  each  with  the  same  two  design  parameters,  that 
implement,  in  an  external  thresholding  system,  the  eight 
positive-threshold  two-Boolean  variable  functions  (i.e.,  the 
eight  out  of  the  sixteen  functions  for  which  two  zero  inputs 
yield  a  zero  output).  Figure  5  shows  a  simple  format  [6]  con- 
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j  ->  j  j .1. 


slating  of  a  screen  with  two  pinholes  separated  by  a  distance 
y.  One  pinhole  is  covered  by  a  phase-shifting  film  0;  there 
is  a  detector  d  and  lower  and  upper  mutually  coherent  point 
sources  I  and  u.  In  the  far-field  approximation  the  distances 
b  and  y  and  the  wavelength  X  =  2n/k  must  be  small  compared  to 
the  distance  s.  With  this  approximation  and  with  b  fixed,  the 
problem  reduces  to  finding  values  of  y  and  0  such  that  the 
detected  signals  I|  for  only  source  1  on,  Iu  for  only  source 
u  on,  and  1^  for  both  sources  on  have  all  six  possible  ine¬ 
quality  relationships.  Referring  to  Figure  5  for  definitions, 
the  following  approximate  expressions  may  be  derived: 

A |  =  exp  i[k(w  +  s)]  +  exp  i[k(v  +  r)  +  0] 

Au  =  exp  i[k(w  +  s)]  +  exp  i[k(u  +  r)  +  0] 

I*  3  lAll2  *  (ks)2(x2  +  ax)2  +  2n(ks)(x2  +  ax)  +  rj2  (6 

Iu  3  1au|2  =  (ks)2(x2  -  ax)2  +  2i) (ks)  (x2  +  ax)  +  t)2 

lb  =  IA1  +  Au!2  *  2(ks)2  [(x2  +  ax)2  +  (x2  -  ax)2] 

+  8r?(ks)x2  +  4rj2  -  4(ks)2(ax)2  , 

where  x  =  y/s,  a  =  b/s,  and  rj  *  ♦  -  x  a  0.  Figure  6  is  a 
graph  [6]  of  the  approximate  expressions  for  Ii»  Iu'  and  lb 
versus  x  for  X  =  628  nm,  b  3  10  Mm,  s  =  10  cm,  and  n  3  .004. 
Note  that  four  of  the  six  inequality  relationships  can  be 
satisfied  using  the  plotted  values;  the  other  two  rela¬ 
tionships  can  be  satisfied  for  other  values  of  ij  . 

The  example  of  Equation  (6)  indicates  the  possible 
complexity  of  general  (physical  optics)  external  thresholding 
synthesis.  Greater  complexity  may  be  anticipated  if  Fresnel 
rather  than  Fraunhofer  diffraction  conditions  are  allowed  and 
if  the  input-output  truth  tables  are  large.  One  approach  to 


such  synthesis  problems  is  to  perform  additional  post¬ 
photodetection  processing  and  to  employ  logical  reduction  and 
residue  arithmetic  techniques  to  reduce  the  amount  of  infor¬ 
mation  that  must  be  stored  for  holographic  look-up  [22] .  A 
more  general  e  'proach  is  to  seek  alternatives  to  requirements 
for  conversic  into  and  out  of  residue  arithmetic  and  for 
additional  all-electronic  processing. 

This  approach  could  involve  (a)  obtaining  the  generally 
large  sets  of  overdetermined  simultaneous  nonlinear  inequali¬ 
ties  that  fully  describe  a  desired  external  thresholding 
system,  (b)  finding  optimal  solutions,  perhaps  with  respect  to 
weight  and  threshold  tolerances,  for  these  sets  using  nonli¬ 
near  programming  techniques  [26] ,  and  (c)  identifying  optical 
systems,  perhaps  based  on  computer  generated  holograms,  that 
implement  the  solutions.  In  the  case  of  optically  recorded 
holograms,  recent  work  indicates  that  control  of  the  relative 
phases  of  the  truth  table  look-up  reference  beams  is  Important 
for  obtaining  selected  reconstructions  without  Interference 
(from  nonselected  reconstructions)  [27].  Other  recent  work 
has  shown  that  such  interference  can  be  reduced  by  using  tech¬ 
niques  involving  gain  competition  in  nonlinear  optical  resona¬ 
tors  which  contain,  for  example,  phase  conjugating  mirrors 
[28] .  These  techniques  clearly  involve  optically  implemented 
decisions  and  are  thus  in  the  internal  thresholding  category. 

3.  INTERNAL  THRESHOLDING  DESIGNS 

Internal  thresholding  designs  have  optical  implemen¬ 
tations  for  decision  as  well  as  interconnection  operations. 
They  are  therefore  more  general  than  external  thresholding 
designs,  for  which  optically  Implemented  decision  operations 
are  not  permitted.  As  was  indicated  in  Section  1,  optics- 
based  systems  that  Implement  decision  operations  need  not  make 
use  of  the  same  Boolean  logic  gate  networks  that  have  become 
conventional  in  all-electronic  integrated  circuits.  In  par- 
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tlcular,  threshold  logic  constitutes  a  more  general  approach 
that  includes  conventional  Boolean  logic  as  a  special  case  and 
that  requires  no  more  and  usually  significantly  fewer  logic 
levels,  elements,  and  interconnections  to  carry  out  the  same 
function. 

Two  simple  examples  of  internal  thresholding  designs,  one 
(a  multiplier-adder)  involving  combinatorial  logic  and  one  (a 
J-K  flip-flop)  involving  sequential  logic  are  considered  in 
Sections  3.1  and  3.2.  These  designs  may  have  effective 
integrated  (or  near-integrated)  optical  implementations 
involving  nonlinear  optical  (e.g.,  bistable)  devices.  Since 
optical-electronic  (or  electronic-optical)  conversions  are 
generally  costly  in  terms  of  speed,  power  consumption,  device 
size,  etc.,  it  is  anticipated  that  these  implementations  will 
require  all-optical  (or  nearly  all-optical)  internal  connec¬ 
tions  to  realize  substantial  performance  improvements  compared 
to  all-electronic  designs. 

3.1  Multiplier-Adder  Example 

This  example  was  motivated  by  the  need  for  high-speed  and 
otherwise  superior  inner-product-step  operations  in  linear 
algebra  (e.g.,  for  matrix-vector  or  matrix-matrix 
multiplication) .  The  inner  product  step  Involves  multiplying 
two  numbers  and  adding  a  third  number.  A  two-bit  multiplier- 
adder,  for  example,  multiplies  two  two-bit  input  numbers  M  and 
N,  adds  the  result  to  a  five-bit  input  number  X,  and  outputs 
the  results  as  a  five-bit  number  Y.  In  clocked  operation,  the 
output  Y  could  be  fed  back  to  the  input  X  to  achieve  a 
multiply-accumulate  result  with  the  capability  of  accumulating 
up  to  three  products  without  overflow. 

A  conventional  Boolean  logic  design  for  a  two  bit 
multiplier-adder  is  diagrammed  in  Figure  7,  where  the 
subscripts  on  M,  N,  X,  and  Y  designate  binary  number  position 
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{2°,  21,  etc.).  Note  that  two  well-known  and  frequently 
occurring  multigate  configurations  have  been  grouped  and 
represented  by  single  symbols.  The  exclusive  OR  (XOR)  func¬ 
tion  may  be  carried  out  In  two  logic  levels  using  two  AND 
gates,  one  OR  gate,  and  two  inverters.  The  full-adder  func¬ 
tion  may  be  carried  out  in  two  logic  levels  with  a  minimum  of 
five  AND  gates,  two  OR  gates,  and  four  inverters.  A  threshold 
logic  design  for  a  two-bit  multiply-adder  [29, 30]  is 
diagrammed  in  Figure  8  and  is  composed  entirely  of  threshold 
logic  elements,  with  fem-in  and  fan-out  limited  to  five. 
Weights  are  indicated  inside  each  element  symbol  adjacent  to 
the  input  lines,  and  the  threshold  is  indicated  ajacent  to  the 
output  line. 

Note  that  the  Boolean  logic  design  in  Figure  7  requires  a 
total  of  38  logic  gates  and  18  inverters.  (Inverters  are  not 
normally  included  in  logic  element  or  level  counts;  however, 
they  do  require  space,  time,  and  power).  This  design  has  a 
maximum  propagation  path  of  9  logic  levels.  The  threshold 
logic  design  in  Figure  8  requires  18  threshold  logic  elements 
and  involves  only  5  logic  levels.  It  is  also  of  interest  to 
compare  the  number  of  interconnect  lines  required  by  the 
designs.  This  count  is  116  for  the  Boolean  logic  design  ver¬ 
sus  70  for  the  threshold  logic  design.  The  comparison  may 
thus  be  summarized  by  stating  that  the  threshold  logic  design 
is  superior  by  factors  of  roughly  two  with  regard  to  number  of 
logic  levels,  number  of  logic  elements,  and  number  of  inter¬ 
connections  . 

A  similar  design  comparison  for  perhaps  a  more  useful 
case  considers  an  8-blt  multiplier-adder  that  multiplies  two 
8-bit  numbers,  adds  a  21-bit  number,  and  outputs  a  21-bit 
result.  In  this  case,  the  fan-in  and  fan-out  constraints  were 
increased  to  eight.  The  results  of  this  design  (with  the  two- 
bit  multiplier-adder  results  in  parenthesis)  are  summarized  in 
Figure  9  [29, 30] .  Note  that  the  threshold  logic  advantage  in 


gate  count  ratio  has  increased  to  almost  three-to-one ,  while 
the  logic  level  and  interconnection  ratios  have  remained  at 
about  two-to-one.  This  design  is  reasonably  complex,  and  one 
is  tempted  to  conclude  that  the  results  are  indicative  of  what 
may  be  obtained  for  more  general  complex  designs.  If  viewed 
in  terms  of  the  ratio  of  processing  speed  to  power  consump¬ 
tion,  it  can  be  argued  that  the  results  also  Indicate  an 
advantage  considerably  greater  than  just  the  separate  level, 
element,  or  interconnection  ratios,  since  these  ratios  each 
contribute  to  either  increased  processing  speed  or  reduced 
power  consumption  or  both. 

As  Indicated  above,  it  is  anticipated  that  all-optical 
elements  and  internal  connections  will  be  required  for  inter¬ 
nal  thresholding  systems  that  realize  their  potential  for  com¬ 
manding  and  enduring  performance  advantages  compared  to 
all-electronic  designs.  If  size  advantages  are  also  to  be 
realized.  Integrated  or  near- integrated  [3l]  optical  tech¬ 
nology  constitutes  one  viable  approach.  Among  the 
many  material  systems  that  have  been  investigated  in  this 
technology  (LiNb03,  glass/S102/Si ,  etc.),  GaAs/GaAlAs  systems 
have  the  best  potential  for  the  complete  integration  of  opti¬ 
cal  sources,  thresholding  devices,  and  detectors  on  one 
substrate.  A  problem  is  that  the  material  nonlinearities 
required  for  thresholding  operations  using  all-optical 
devices  are  two  to  four  orders  of  magnitude  above  the  values 
that  characterize  current  uniform  electro-optic  materials  for 
the  case  where  the  optical  input  is  provided  by  laser  diode 
sources  [32,33].  However,  GaAs/GaAlAs  multiple  quantum  well 
(MQW)  structures  may  have  the  required  nonlinearity  properties 
at  room  temperature,  with  favorable  switching  speed  and  energy 
characteristics,  and  in  dense  arrays  on  GaAs  Integrated  optical 
structures  [34] .  Other  promising  material  systems  are  also 
being  Investigated,  including  systems  based  on  InSb  [35,36]  and 
on  multiple-layer  Langmuir-Blodgett  organic  films  [37] . 


3.2  J-K  Flip-Flop  Example 

The  J-K  Flip-flop  is  a  basic  unit  in  conventional  sequen¬ 
tial  logic  architectures  that  is  clocked  synchronously  by  an 
external  source  or  asynchronously  by  Internal  component  time 
delays.  It  has  two  inputs  J  and  K  and  an  output  Q:  if  both  J 
and  K  are  0  the  output  is  the  previous  value  of  Q,  if  only  J 
is  0  Q  is  0,  if  only  K  is  0  Q  is  1,  and  if  both  J  and  K  are  1 
Q  is  the  opposite  of  its  previous  value.  Figure  10  shows  how 
a  J-K  flip-flop  could  be  Implemented  using  a  system  in  which 
"femtosecond  pancakes"  of  light  store  previous  output  values 
in  an  otlcal  feedback  loop  [  1 8 ]  .  Note  that  a  maximum  of  six 
optical  paths  and  three  all-optical  thresholding  devices  are 
required  and  that  corresponding  weight  and  threshold  values 
for  J-K  flip-flop  operation  are  indicated. 

The  "femtosecond  pancake"  architecture  [20,38,39]  is 
equivalent  to  a  network  of  threshold  logic  elements  as  in 
Figure  8  but  with  feedback.  Since  this  architecture  is  among 
the  most  general  and  powerful  designs,  its  characteristics  and 
potential  are  discussed  more  generally  below. 

4 .  SUMMARY 

The  "femtosecond  pancake"  architecture,  for  which  Figure 
11  gives  a  general  outline,  is  a  suitable  design  for  sum¬ 
marizing  key  Issues  related  to  thresholding  and  weighting  in 
optical  computing.  In  this  design  input  data,  Including 
control  and  programming  information,  is  spatially  and  tem¬ 
porally  encoded  on  an  input  beam.  An  optical  interconnection 
array  (containing  holograms,  lenses,  etc.),  performs  weighting 
operations,  and  an  array  of  nonlinear  optical  devices 
(generally  with  gain)  performs  thresholding  operations.  A 
feedback  optical  path  makes  the  overall  design  a  sequential 
computing  system  in  which  spatially  and  temporally  modulated 
"femtosecond  pancakes"  of  light  may  circulate  in  a  pipeline 


manner.  Timing  is  performed  asynchronously  with  a  clock 
period  related  to  the  optical  feedback  loop  time  or  synchro¬ 
nously  using  external  clock  inputs.  Electrical  inputs  to  the 
nonlinear  device  array  and  to  the  interconnection  array 
(perhaps  through  electro-optically  controlled  gratings  for  the 
latter)  may  also  supplement,  at  relatively  low  data  rates,  the 
optical  input,  control,  and  programming  data. 

As  noted  in  Section  3.2,  the  "femtosecond  pancake"  archi¬ 
tecture  is  equivalent  to  a  network  of  threshold  logic  elements 
with  feedback.  It  is  also  equivalent  in  some  respects  to 
designs  in  which  all-electronic  logic  gates  are  Interconnected 
using  optical  techniques;  e.g.,  the  intra-chip  optical  inter¬ 
connection  of  VLSI  circuits  [40] .  Finally,  this  architecture 
has  recently  been  investigated  in  connection  with  neural  net¬ 
works  [41,42]  which,  as  mentioned  in  Section  1.1,  can  have 
exceptionally  "intelligent"  and  flexible  performance  charac¬ 
teristics.  For  any  of  these  interpretations  (and  generally 
for  architectures  that  use  arrays  of  nonlinear  optical  devi¬ 
ces)  power  dissipation  is  a  current  performance-limiting  fac¬ 
tor.  For  example,  a  maximum  power  dissipation  of  10  W/cm2,  a 
minimum  device  area  of  lOjjm2,  and  a  minimum  device  switching 
energy  per  unit  area  of  1  FJ/nm2,  which  are  reasonable  projec¬ 
tions  of  current  GaAs-based  technology  [32] ,  imply  that  on  the 
order  of  1015  logical  operations  per  second  per  square  cen¬ 
timeter  (1015  gate-Hz/cm2)  could  be  performed,  which  is  two 
orders  of  magnitude  higher  than  present  VHSIC  program  goals. 
However,  if  power  dissipation  is  of  no  concern,  as  may  be  the 
case  for  low  duty  cycle  or  "burst"  operation,  then  minimum 
device  switching  times  of  lOps  imply  on  the  order  of  1018 
gate-Hz/cm2 . 

This  chapter  has  discussed  the  potential  of  optical  com¬ 
puting  in  terms  of  weighting  and  thresholding  operations  and 
has  presented  simple  examples  of  external-thresholding  and 
internal-thresholding  designs.  A  major  point  is  that  optlcs- 
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based  technology  excels  in  performing  interconnection  or 
weighting  operations  but  may  have  no  general  advantage  over 
all-electronic  technology  in  performing  decision  or 
thresholding  operations.  A  research  goal  is  thus  to  identify 
optical  computing  architectures  that  make  maximum  use  of  the 
interconnection  and  related  advantages  of  optics  to  realize 
commanding  and  enduring  performance  characteristics. 
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Example  of  a  threshold  logic  element  that  implements 
the  function  y  =  Xj X2X3X4  +  X1X3.  (a)  Threshold  ele¬ 
ment,  (b)  truth  table. 
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Figure  2.  (a)  Two-bit  multiplier  truth  table,  (b)  nonlinear 

inequality  for  boxed  entries  in  truth  table. 


7-68 


External  thresholding  two-bit  multiplier,  (a)  Inter¬ 
connections  from  sources  x  and  y  to  detectors  z;  (b) 
no-attenuation  solution  for  interconnection  phases 
detection  threhsolds  T,  and  threshold  tolerances 
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pseudorandom  shift  register  sequences 
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Abstract 

The  coding  of  digital  Images  using  linear  feedback  shift  register  (LPSR)  sequences  is 
considered  analytically  and  in  computer  simulations.  It  Is  shown  that  LPSR  sequences  can 
provide  efficient  representations  of  binary  and  multilevel  Images  In  terms  of  a  relatively 
small  set  of  Integers  related  (  In  limiting  cases)  to  Image  complexity  and  randomness. 

Introduction 

Pseudorandom  linear  feedback  shift  register  (LPSR)  sequences  are  commonly  employed  In  the 
characterization  of  one-dlmenslonal  communication  signals,  but  no  extensive  use  has  been 
made  of  these  sequences  for  charactsrlzing  images.  This  brief  paper  shows  that  In  some 
situations  LPSR  sequences  can  provide  efficient  representations  of  binary  and  multilevel 
Images  In  terms  of  a  relatively  small  set  of  integers  related  (in  limiting  cases)  to  Image 
complexity  and  randomness.  The  basic  concept  for  a  binary  image  Is  as  follows:  The  Image 
is  coded  Into  a  binary  sequence  using  the  MacWllliama-Sloane1  or  other  pseudorandomnesa- 
preserving  construction.  The  Berlekamp-Massey  algorithm2  Is  then  used  to  determine  the 
lowest-order  (i.e.,  smallest  number  of  stages)  LPSR  that  could  have  generated  the  coded 
sequence.  This  determination  provides  the  coefficients  In  a  two-element-field 
autoregression  of  the  coded  sequence  or,  equlvale.it ly,  a  set  of  positive  integers  that 
represent  the  generating  LPSR  feedback  connection  positions.  In  the  limiting  case  where 
2-1  Image  pixels  are  characterized  by  an  n-stags  LPSR,  the  representation  may  have  minimum 
complexity  and  maximum  randomness.  Some  potential  applications  for  this  characterization 
are  In  the  areas  of  data  compression  (e.g.,  texture  coding)  and  coded  aperture  Imaging4 
(e.g.,  determination  of  optimum  aperture  patterns).  The  limited  objectives  considered  below 
Include  (1)  finding  the  smallest-stage-number  LPSR  sequences  that  could  have  generated  small 
random  binary  arrays,  (2)  obtaining  histograms  of  stags  number  and  period  length,  and  (3) 
making  observations  on  Image  coding  using  maximum- length  and  non-maximum  length  LFSRs . 


Figure  1  Is  an  example  of  a  binary  maximum- length  LPSR.  Here  the  contents  of  the  second 
and  third  of  the  n  *  3  stages  (represented  by  squares)  are  added  modulo2  (equivalent  to  an 
Exclusive-Or  operation)  and  fed  back  to  the  first  stage.  As  in  any  shift  register,  the  con¬ 
tents  of  each  stage  are  shifted  to  the  right  at  each  clock  cycle.  The  LFSR  then  cycles 
through  all  2n  -  1  non-zero  states  (e.g.,  the  seven  rows  of  stage  contents  Indicated  In 
Figure  1)  before  repeating,  regardless  of  the  Initial  stats.  The  LFSR  output  Is  the  con¬ 
tents  of  the  last  stage,  and  the  LPSR  Is  called  maximum-length  because  its  output  pattern 
has  the  longest  possible  repeat  period. 

In  general,  the  stages  may  be  assigned  q  levels,  where  q  Is  a  prime  number  or  a  power  of  a 

prime.  Three  properties  of  the  output  of  a  general  LFSR  are  necessary  for  a  pseudorandom 

output : 1 

(1)  The  pattern  repeat  period  is  qn-l  output  values,  qn_1  of  which  are  zero  and  qn  1-l  of 
which  are  nonzero. 

(2)  A  window  of  n  consecutive  stages  located  at  all  different  positions  on  the  output 
sequence  will  contain  each  of  the  q-l  nonzero  n-tuples  once. 

(3)  The  autocorrelation  function  of  the  output  sequence  is 

!l ,  t  -  0 

-1/n,  q  -  2,  t  *  0  (1  <  t  <  2n  -  2) 

-l/(qn  -  1) .  q  *  2,  t  *  0  ( 1  <  t  <  q"  -  2) 
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MacWilliams-Sloane  pseudorandom  array  construction 


A  pseudorandom  array  construction  that  preserves  the  above  three  properties  In  two  dimen¬ 
sions  is  as  follows: 


(1)  Select  a  maximum- length  LFSR  sequence  with  n  »  4,6,8,...  stages. 

(2)  Fill  a  2n,/1  -  1  by  2n^2  +  1  array  with  this  sequence  on  the  main  diagonal  using 
contiguous-array  continuation.  Figure  2  shows  the  placement  of  sequence  positions 
0,1,2,  . . .  on  a  3  x  5  array  (n  »  4). 


Note  that  step  2  may  be  used  to  construct  an  array  from  a  sequence  or  vice  versa  even  if 
the  sequence  is  not  from  a  maximum-length  LFSR.  However,  In  this  case  the  corresponding 
array  will  not  In  general  have  the  three  necessary  properties  of  pseudorandomness. 


Berlekamp-Massev  algorithm 


The  Berlekamp-Massey  algorithm  efficiently  identifies  the  LFSR  with  the  least  number  of 
stages  that  could  have  generated  a  given  data  sequence.  It  requires  2n  error-free  con¬ 
secutive  sequence  values  to  identify  an  n-stage  maximum-length  LFSR.  It  may  be  viewed  as  an 
elegant  procedure  for  solving  n  auto-rsgresslon  equations  In  n  unknowns  on  a  q-e lament 
field,  where  q  is  finite  or  infinite.  The  algorithm  involves  on  the  order  of  2nz  arith¬ 
metic  operations9  and  Is  presented  in  outline0  In  Figure  3. 


Example  Image  coding  stage  number  and  period  length  histogram 


Figure  4  shows  two  example  histograms  of  LFSR  stage  number  for  random  bits  that  could  be 
placed  In  a  3-x-5  array  and  read  as  sequences  using  the  MacWilliams-Sloane  construction. 

The  minimum  stage  number  was  determlnsd  by  the  Berlekamp-Massey  algorithm  for  two 
situations.  In  the  first,  "Stage  number  for  one  case,"  the  algorithm  was  applied  to 
sequences  read  as  shown  In  Figure  2  from  300  random  binary  arrays.  (Bits  In  these  arrays 
were  assigned  by  a  "good"  computer  random  number  generator.7)  In  the  second,  "Minimum  stage 
number  for  60  cases."  the  algorithm  was  applied  to  sequences  read  60  times  from  each  of  300 
random  binary  arrays,  and  the  minimum  stags  number  was  selected.  Here  the  60  cases  were 
defined  by  the  four  possible  array  diagonal  directions  and  by  the  15  possible  array  starting 
positions.  In  the  first  situation  the  3-X-5  array  required  a  mean  of  m*7.60  +  1.05  stages 
for  LFSR  representation.  The  corresponding  image  coding  Is  specified  by  2m  bits  (m  to  spec¬ 
ify  the  LFSR  top  connections  and  a  to  specify  the  LFSR  Initial  loading),  and  thus  may  not 
be  preferred  to  direct  specification  of  the  15  Image  bits.  In  the  second  situation  the 
3-X-3  array  required  a  mean  of  6.31  ±  0.87  bits  for  LFSR  representation.  The  corresponding 
Image  coding  may  thus  be  advantageous  compared  to  direct  Image  bit  specification. 


Figure  5  shows  histograms  of  LFSR  period  length  for  random  bits  that  could  be  placed  In  a 
3-x-5  array  and  read  as  sequences  using  the  MacWilliams-Sloane  construction.  The  period 
lengths  are  for  mlnlmum-stage-number  LFSRs  determined  as  In  the  "One  case"  situation  in 
Figure  4  but  for  specified  numbers  of  ones  and  zeros  in  the  arrays.  Note  that-the  period 
lengths  cover  more  than  a  factor  of  1000  but  that  they  have  a  modal  value  of  2l  -1  «  127. 
This  is  expected  because  a  sequence  of  15  bits  can  be  represented.  In  the  absence  of  singu¬ 
larities,  by  an  autoregression  that  yields  eight  equations  In  seven  unknowns  (corresponding 
to  seven  LFSR  stages) .  Note  also  that  arrays  balanced  with  nearly  equal  numbers  of  ones  and 
zeros  have  fewer  extreme  period  lengths  than  do  unbalanced  arrays. 


Observations  on  Image  coding  usJ 


Image  coding  using  LFSR  sequences  divides  naturally  into  two  cases:  coding  using  maximum- 
length  LFSR  sequences  and  coding  using  non-maxlmum-length  LFSRs.  In. the  maximum- length 
case,  which  may  be  useful,  for  example,  in  synthetic  texture  coding,  q"  -  1  q-nary  pixels  vs 

may  be  represented  by  2n  q-nary  values  (n  feed-back  connections  and  n  initial  loadings). 

Thus,  If  n  «  18  and  q  -  2,  65535  pixels  may  be  represented  by  only  32  binary  values — a  poten- 
tially  great  advantage.  Maximum- length  coding  is  also  characterized  by  minimum  complexity 
(minimum  number  of  LFSR  stages2)  and  maximum  randomness  (the  three  properties  necessary  for 
pseudorandomness  and  realized).  In  the  non-maxlmum-length  case,  however,  only  limited  data 
compreeslon  may  be  poselble:  Figure  4  Indicates  that  20*  may  be  achieved  on  the  average  for 
random  3-x-S  binary  arrays.  It  may  be  concluded  that  maxi mum- length  LFSR  sequences  offer 
large  potential  data  representation  or  compression  advantages  and  that  these  advantages  may 
be  realized  In  coding  random  or  apparently  random  Images. 
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Figure  1 


Example  of  a  linear  feedback  shift  register  (LPSR): 
n  ■  3  stage,  maximum  length. 


Figure  2.  MacWllliams-Sloane  pseudorandom  array  construction 
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Figure  3.  Outline  of  Berlekaap-Massey  algorithm 


STAGE  NUMBER 


Plgure  4.  Histogram*  of  LFSR  stage  number:  random  bits  In  a 
3-X-5  array,  sequence  read  using  MacWllliass-Sloane 
construction. 
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Figure  5.  Histograms  of  LFSR  period  length:  random  bits  In  a 
3-x-S  array;  specified  numbers  of  ones  and  zeroes. 
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Abstract 

Optical  threshold  logic  implement  ions  of  register- level  operations  such  as  multiply- 
aocumulate  are  considered.  Specific  2-  and  8-bit  mui t iply-aUd  designs  using  both  conven¬ 
tional  (Boolean)  and  threshold  logic  are  described  and  compared.  The  threshold  logic 
designs  are  shown  to  have  advantages  of  factors  of  2  or  3  in  number  of  logic  levels,  number 
of  logic  elements,  and  number  of  interconnections.  The  possibility  of  all-optical  implemen¬ 
tations  (optical  logic  elements  and  optical  internal  connections)  of  these  designs  is 
discussed  in  terms  of  Integrated  optical  networks  and  nonlinear  optical  devices,  par¬ 
ticularly  in  GaAs  structures. 

Intrqduct ion 

Optics-based  digital  processing  may  be  carried  out  on  the  processor,  register,  or  gate 
level.  Many  processor- level  systems  have  been  described,  including  systolic  optical  array 
processors  that  make  effective  use  of  acousto-optic  cells,  lens  assemblies,  and  optical 
source  and  detector  arrays,  to  perform  digital  matrix-vector  multiplication  or  other  matrix 
algebra  operat ions . 1 • 2 • 3  Numerous  optics-based  gate- level  devices  have  also  been  con¬ 
sidered.  particularly  devices  based  on  optical  bistability4'5  that  could  be  very  high¬ 
speed  replacements  for  conventional  all-electronic  logic  gates.  Optics-based  register  level 
modules  such  as  scalar  multipliers,  mult iply-accumulators ,  correlators,  etc.,  have,  with 
some  except  ions . 6 • 7 • 9 • 9  received  less  attention.  Desirable  features  of  optically  imple¬ 
mented  register-level  architectures  include  modularity  (for  fault-tolerance  and  application 
flexibility)  and  either  electronic  or  optical  Inputs  and  outputs  (for  compatibility  with 
other  components  or  subsystems). 

Since  opt ical -electronic  (or  electronic-optical)  conversions  are  generally  costly  in 
terms  of  speed,  power  consumption,  device  size,  etc.,  it  is  anticipated  that  optics-based 
register- level  modules  will  require  all-optical  (or  nearly  all -optical)  internal  connections 
in  order  to  realize  a  potential  for  substantial  performance  improvements  compared  to  all- 
electronic  designs.  Such  optics-based  modules  need  not  make  use  of  the  same  Boolean  logic 
gate  structures  (OR.  SAND,  etc.)  that  have  become  conventional  in  all-electronic  integrated 
circuit  technology.  In  particular,  threshold  logic  constitutes  a  more  general  technology 
that  includes  conventional  Boolean  logic  as  a  special  case  and  that  requires  no  more  and 
often  significantly  fewer  logic  levels,  elements,  and  interconnections  to  carry  out  the  same 
function.  Threshold  logic  elements  and  networks  may  have  effective  integrated  (or  near- 
integrated)  optical  implementations  involving  nonlinear  optical  (e.g.,  bistable)  devices. 

Some  specific  multlply-add  threshold  logic  networks  are  considered  in  this  paper,  and  poten¬ 
tial  all-optical  Implementations  are  discussed. 

Threshold  Logic  E leeents 

A  threshold  logic  element,  in  the  form  usually  considered  in  the  literature.10  • 1 1  ■  1 2  has 
several  binary  inputs  and  one  binary  output.  It  executes  a  logic  function  of  the  inputs 
that  depends  on  the  sum  of  products  obtained  by  multiplying  each  binary  input  by  a  fixed 
analog  weight.  If  this  sum  is  less  than  a  fixed  threshold  level,  the  element  output  is 
zero;  otherwise  the  output  is  a  one.  For  example,  an  element  with  binary  Inputs  xj,  xj,  X3, 
and  X4  and  corresponding  weights  wj  ■  2 ,  wj  ■  1 ,  W3  ■  3  and  W4  ■  1  with  a  threshold  level  T 
in  the  range  4  <  T  (  5  implements  the  logic  function  y  »  X1X2X3X4  *  xlx3’  y  is  the 

binary  output,  the  implied  product  Is  an  AND  operation,  the  plus  sign  is  an  OR  operation, 
and  the  superscript  bar  is  a  NOT  operation.  Note  that  the  Implementation  of  this  function 
using  conventional  logic  requires  three  logic  levels  (including  inversion),  whereas  the 
threshold  logic  element  constitutes  a  single  logic  level  and  thus  has  a  potential  factor-of- 
three  speed  advantage.  Note  also  that  in  this  example  there  say  be  up  to  an  11*  variation 
In  the  value  of  T  without  effecting  element  operation,  that  similar  tolerances  will  apply  to 
the  analog  weights,  and  that  these  generally  non-precise  analog  weighting  and  thresholding 
operations  may  be  viewed  as  Intuitively  appropriate  for  optical  Implementation. 
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Characteristics  such  as  fewer  logic  ievels,  practical  analog  weight  and  threshold  toleran¬ 
ces.  and  possible  optical  implementations  may  apply  to  threshold  logic  elements  in  general 
and  to  suitably  designed  networks  of  such  elements. 

Threshold  Logic  Networks 

Conventional  Boolean  logic  and  threshold  logic  designs  may  be  compared  for  specific 
register-level  operations.  One  relatively  simple  example  is  a  two-bit  multiply-add  module. 
This  module  multiplies  two  two-bit  input  numbers  M  and  N.  adds  the  result  to  a  five-bit 
input  number  X,  and  outputs  the  result  as  a  five-bit  number  Y.  Such  a  module  could  be  the 
heart  of  a  clocked  multiply-add  module  in  which  input  and  output  registers  and  timing 
control  would  be  included.  (For  examples  of  existing  digital  electronic  chips  with  this 
function,  see  "The  VLSI  Data  Book,"'  pp.  231-294,  TRW,  La  Jolla,  CA,  1984.)  In  clocked 
operation,  the  output  V  could  be  fed  back  to  the  input  X  to  achieve  a  mul t iply-accumulate 
operation  with  the  capability  of  accumulating  up  to  three  products  without  overflow. 

A  conventional  Boolean  logic  design  for  a  two-bit  multiply-add  module  is  diagrammed 
In  Figure  1,  where  the  subscripts  on  M,  N,  X,  and  Y  designate  binary  number  position  (2°, 

2*.  etc.).  For  simplicity,  two  well  known  and  frequently  occurring  multigate  configurations 
have  been  grouped  and  represented  by  single  symbols.  The  exclusive  OR  (XOR)  function  may  be 
accomplished  in  two  logic  levels  using  two  AND  gates,  one  OR  gate,  and  two  inverters. 

The  full-adder  Is  s  standard  functional  module  which  can  be  cascaded  to  fora  a  ripple-carry 
adder  for  two  numbers  of  any  bit  length.  The  three  inputs,  a,  b,  and  c.  represent  the  two 
corresponding-order  bits  of  the  two  input  numbers  and  the  carry  bit,  which  is  obtained 
(normally)  from  the  carry  output  of  the  previous  adder.  The  output  s  Is  the  corresponding 
bit  of  the  sum,  and  the  output  c  is  the  carry  bit  normally  fed  to  the  next  adder.  A  mini¬ 
mized  two-level  logic  configuration  which  performs  the  full-adder  function  requires  five  AND 
gates,  two  OR  gates,  and  four  inverters. 

A  threshold  logic  design  for  a  two-bit  multiply-add  module*3  Is  diagrammed  In  Figure  2 
and  is  composed  entirely  of  threshold  logic  elements,  with  fsn-in  and  fan-out  limited  to 
five.  Weights  ere  indicated  inside  the  element  symbol  adjacent  to  the  element  input  lines, 
and  the  element  threshold  is  indicated  adjacent  to  the  output  11ns. 

Note  that  the  Boolean  logic  design  in  Figure  1  requires  a  total  of  38  logic  gates  and 
18  inverters.  Inverters  are  not  normally  included  in  the  logic  level  count;  however,  they 
do  require  circuitry  and  space.  This  design  has  a  maximum  propagation  path  of  9  logic 
levels.  The  threshold  logic  design  in  Figure  2  requires  la  threshold  logic  elements  and 
involves  only  5  logic  levels.  It  Is  also  meaningful  to  compare  the  number  of  Interconnect 
lines  required  by  the  designs.  This  count  is  116  for  the  Boolean  logic  design  versus  70  for 
the  hreshold  logic  design.  Die  comparison  may  thus  be  summarized  by  stating  that  the 
threshold  logic  design  is  superior  by  factors  of  roughly  two  with  regard  to  number  of  logic 
levels,  gate  count,  and  number  of  interconnections.  Although  the  design  example  analyzed  is 
of  questionable  practical  significance,  the  comparison  strongly  favors  threshold  logic. 

A  similar  design  comparison  waa  carried  out  for  what  is  perhaps  a  more  practically  useful 
case:  an  8-bit  multiplier-adder  that  multiplies  two  8-bit  numbers,  adds  a  21-bit  number,  and 
outputs  a  21-blt  result.  In  this  case,  the  fan-in  and  fan-out  constraints  were  Increased  to 
eight.  The  results  of  this  design*3  (with  the  2-bit  multiplier-adder  results  In  parenthe¬ 
sis)  ars  summarized  in  Figure  3.  As  can  be  seen,  the  threshold  logic  advantage  in  gate 
count  ratio  has  Increased  to  almost  three-to-one,  whils  the  logic  level  and  Interconnection 
ratios  have  remained  at  about  two-to-one.  This  daslgn  is  rsasonably  complex,  and  one  is 
tempted  to  conclude  that  the  results  srs  lndlcstlvs  of  whst  msy  bs  obtslnsd  for  more  general 
complex  designs.  In  any  cass,  tha  rssults  indleats  a  significant  advantags  for  at  lsast  one 
design  of  practical  importanca.  If  viewed  in  terms  of  ths  ratio  of  procsssing  speed  to 
power  consumption.  It  can  ba  argued  that  the  results  also  indleats  an  advantaga  considerably 
greater  than  Just  ths  ssparsts  leval,  element,  or  interconnection  ratios,  since  these  ratios 
each  contribute  to  either  Increased  processing  spsad  or  rsducsd  powsr  consumption  or  both. 

Optical  Thrsshold  Logic  Implemsntat Ion 

As  lndlcatsd  abovs.  It  Is  antlcipatsd  that  all-optical  slsasnts  and  intsrnal  connections 
will  be  required  for  register-level  modules  that  realize  their  potential  for  commanding  and 
enduring  performance  advantages  compsrsd  to  al 1-elsctronlc  designs.  if  size  advantages  are 
also  to  bs  rssllzsd,  integrated  optical  tschnology  appears  to  constltuts  ths  only  viable 
approach.  Furthermore,  among  ths  many  material  systeas  that  have  bean  investigated  In  this 
technology  (LiNb03,  glass/SlOj/Sl .  etc.),  only  OaAs/GaAlAs  eysteas  have  a  strong  potential 
for  ths  complete  Integration  of  optical  sourcas  and  detectors  on  one  substrata.  Recently, 
enhanced  slsctro-opt le  affects  reported  In  GaAs/GaAlAs  multiple  quantum  well  (MQW) 
structures*4 • *5 ■ 16  have  opened  up  the  prospect  of  the  eventual  integration  of  high- 
performance  optical  gates  In  danse  arrays  on  GaAs  Integrated  optical  structures.  (Other 
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me  thresholding  function  for  TL  elements  would  Ideally  be  carried  out  using  nonlinear 
optical  methods,  particularly  methods  related  to  optical  bistability.  For  such  methods  an 
;nput  intensity  below  a  threshold  level  yields  a  small  output  intensity  while  an  input 
intensity  above  the  threshold  level  yields  a  large  output  intensity.  Recent  work19  indi¬ 
cates  that  the  required  material  nonlinearities  for  these  methods  are  two  to  four  orders  of 
magnitude  beyond  the  values  that  characterize  current  uniform  electro-optic  materials,  such 
as  LiNbO]  or  GaAs  for  the  case  where  the  optical  input  is  provided  by  laser  diode  sources. 
However.  MQW  structures  and  certain  "exotic"  materials  such  as  multiple-layer 
Langmulr-Blodgett  organic  films20  may  have  the  required  nonlinearity  properties.  In  par¬ 
ticular.  a  variety  of  single  element  MQW  electro-optic  devices  operating  at  room  tem¬ 
perature  with  promising  speed  and  energy  parameters  have  recently  been  reported.14  These 
devices  are  relatively  simple  structures  which  have  good  potential  to  be  integrated  in  large 
arrays  of  small  area  devices,  l.e.,  in  GaAs  integrated  optical  circuits.  A  key  feature  is 
that  this  and  several  other  such  MQW  GaAs  structures  have  optical  inputs  and  outputs,  thus 
supporting  the  concept  of  the  all-optical  modules  that  use  optical  logic  elements  and  opti¬ 
cal  internal  connections  exclusively. 


As  a  basis  for  a  discussion  of  the  potential  of  register-level  optical  threshold  logic 
elements  and  networks,  consider  the  all-electronic  16-blt  multiplier  chip  which  has  been 
announced  by  Fujitsu.21  This  GaAs  chip  performs  a  16-bit  multiply  in  10. 5  ns,  consumes 
approximately  0.3  W,  contains  about  3168  conventional  logic  gates,  and  has  roughly  18 
logic  levels  if  0.6  ns  per  level  is  assumed.  Thus,  the  chip  performs  at  a  rats  of  99 
Megaoperations  per  second  (MOPS)  with  a  specific  power  consumption  of  about  300  MOPS/Watt. 

As  a  very  crude  projection,  it  may  ba  assumed  that  a  f actor-of-thrss  advantage  in  number  of 
logic  elements  and  no  advantage  in  logic  levels  would  be  realized  in  a  threshold  logic 
16-blt  multiplier  design.  Thus  about  1,100  optically  connected  threshold  logic  elements 
would  be  required.  Recent  results14  demonstrated  switching  energies  of  4  to  20  FemtoJoules 
per  square  micron  for  GaAs  MQW  gates.  Based  on  an  assumed-switching  time  of  100  ps,  the 
time  for  a  16-blt  multiply  la  16  x  100  ps  “  1.8  ns.  Assuming  an  sight  square  micron  gats 
area,  the  power  dissipation  for  the  threshold  logic  module  may  be  as  small  as  1,100  elements 
X  32  FemtoJoules/ 1 . 8  ns  ■  20  mW.  If  the  size  of  a  threshold  logic  element  is  estimated  to 
be  90  microns  by  100  microns  (including  an  allowance  for  interconnections),  then  the  area 
required  by  1,100  such  elements  is  3.3  mm2  which  is  comparable  to  the  area  of  typical 
current  IC  chips.  The  thruput  of  the  projected  optical  module  is  about  900  MOPS,  but  the 
optical  module  has  a  specific  power  consumption  of  about  23,000  MOPS/Watt-two  orders  of 
magnitude  better  than  the  all-electronic  chip. 
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Figure  1.  Conventional  Boolean  Logic  (2  x  2  +  5  -  5)  -Bit  Multiplier-Adder. 
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Figure  2.  Threshold  Logic  (2  x  2  +  5  •  5)  -Bit  Multiplier-Adder. 
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Abstract 

Optically  implemented  threshold  logic  systems  that  are  characterized  by  thresholding 
operations  concentrated  at  one  functional  location  are  considered.  The  objective  is  to 
identify  architectures  and  associated  integrated  optical  and  holographic  techniques  that 
might  be  used  to  design  superior  register-level  computation  modules.  A  complete  design  for 
a  lumped  threshold  2-blt  multiplier  is  presented  as  an  example,  and  methods  for  general 
lumped  threshold  module  synthesis  are  discussed. 

Introduction 

Threshold  logic  has  received  attention  recently  because  it  may  provide  significant  per¬ 
formance  advantages  for  a  broad  range  of  mathematical  operations  and  because  it  may  have 
efficient  optical  Implementations . 1 > 2  This  paper  considers  lumped  threshold  logic  systems, 
which  are  defined  as  systems  in  which  thresholding  operations  are  concentrated  at  one  func¬ 
tional  location.  The  objective  is  to  identify  architectures  and  associated  holographic  and 
integrated  optical  techniques  that  might  be  used  to  design  register-level  computation  mod¬ 
ules,  such  as  pure-radix  multipliers,  multiply  accumulators,  etc.,  with  commanding  and 
enduring  advantages  in  speed,  power  consumption,  size,  fault-tolerance,  etc.,  over  current 
and  projected  all-electronic  alternatives. 

Figure  1  shows  two  general  types  of  systems  that  relate  sets  of  inputs  and  outputs  using 
weighting  and  thresholding  operations.  Here  "weighting"  refers  to  interconnects  with 
selected  connection  strengths,  and  " thrssholding"  refers  to  decisions  based  on  inequality 
criteria.  Distributed  threshold  logic  systems  generally  have  numerous  distinct  elements  of 
the  same  type,  each  of  which  performs  weighting  and  thresholding  functions.  For  example, 
each  element  could  conceivably  be  an  optlcal-lnput-optlcal-output  multiple  quantum  well 
(MQW)  gate,  and  all  elements  could  be  optically  Interconnected  so  that  the  thresholding 
operations  would  be  distributed  throughout  the  system.  In  contrast,  lumped  threshold  logic 
systems  generally  have  only  two  functional  units,  one  for  weighting  and  one  for 
thresholding.  For  example,  the  weighting  operation  could  be  accomplished  by  passive  or 
active  (l.e.,  programmable)  Integrated  optical  diffracting  elements,  and  the  thresholding 
operation  could  be  accompllehed  by  photo-detectors  at  an  optical-to-electronic  output  inter¬ 
face.  In  this  case  the  (nonlinear)  thresholding  operation  is  global  or  concentrated  at  the 
output  of  the  system. 


Lumped  Threshold  2-Blt  Multiplier 

The  2-blt  multiplier  may  be  used  as  a  simple  example  of  lumped  threshold  logic  design. 
Figure  2  is  a  2-blt  multiplier  truth  table  for  the  multiplication  of  binary  numbers  xx  x0 
and  yx  Yo  to  obtain  23  zj  *i  zg.  Suppose  that  the  four  input  bits  are  represented  by  0  if 
they  are  zero  and  by  xx  ■  Ax  exp(idx),  x<j  *  Aj  exp(i*2><  Y1  “  *3  exp(i*3),  and  yo  *  A4 
exp( IS4)  if  they  are  ones.  If  these  expressions  correspond  to  waves  in  a  geometrical  optics 
approximation  where  all  source-source,  source-detector,  and  detector-detector  distances  are 
large  compared  to  the  wavelength,  then  the  2-blt  multiplier  may  be  designed  as  shown  in 
Figure  3a.  Here  xg,  xj,  yo,  and  yx  are  optical  point  sources,  zo,  z\.  zj,  and  Z3  are  point 
photodetectors.  The  lines  indicate  optical  paths,  each  of  which  may  have  a  selected  atte¬ 
nuation  and  phase  shift  that  might  be  implemented  by  a  hologram  or  Integrated  optical 
diffracting  elements. 
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The  required  attenuations  and  phase  shifts  may  be  obtained  by  solving  sets  of  s.nul-i- 
neous  nonlinear  inequalities  derived  from  the  truth  table.  For  example,  the  :2th  row  art 
the  z2  column  of  the  table  imply  a  signal  at  photodetector  23  that  must  equal  or  exceed  a 
threshold  T2: 

!Aj  exp(i«j)  «•  A3  exp(i<»3)  ♦  A4  exp(i®4)|2  »  T2  '  1  ■■ 


Similar  expressions  may  be  obtained  so  that  each  of  the  four  output  columns  (labeled  23.  z7 
z 1 ,  and  zo)  is  described  by  a  set  of  16  (one  for  each  table  row)  simultaneous  nonlinear  ine¬ 
qualities  in  9  unknowns:  four  amplitudes  (Aj,  A2,  A3,  and  A4);  four  phases  (03,  ®2,  b 3 , 

«4) ;  and  one  threshold  ( T 1 ,  T2.  T3,  or  T4}.  Solutions  are  to  be  found  for  each  of  these 
four  overdetermined  inquality  sets  (which  involve  terms  such  as  A32,  2Ai  A2cos  ( <t>  1  )  .  etc.) 

such  that  the  amplitudes,  phases,  and  thresholds  obtained  all  have  acceptable  tolerances  or 
ranges  over  which  they  may  vary  without  affecting  proper  2-blt  multiplier  operation. 

One  solution  which  involves  only  phase  shifts  (no  attenuations)  and  which  may  have  prac¬ 
tical  tolerances  is  given  in  Figure  3b,  where  the  $  column  gives  the  phase  shifts  required 
for  each  of  the  four  paths  (in  order)  to  each  detector,  the  T  column  gives  the  threshold 
value  for  each  detector  when  A  3  »  A2  »  A3  =  A4  »  1,  and  the  AT/R  column  gives  the  fraction 
of  the  total  signal  range  on  each  detector  over  which  its  threshold  may  vary.  Figure  4  is  a 
histogram  of  aT/R  for  output  z2  generated  by  selecting  each  of  the  four  phases  for  this  out¬ 
put  randomly  from  normal  distributions  with  means  at  their  design  values  and  standard 
deviations  equal  to  arctan  (.1).  These  standard  deviations  correspond  to  10*  displacement 
of  the  phase  vectors,  and  Figure  4  shows  that  such  variations  reduce  the  threshold  tolerance 
AT/R  for  output  z2  from  37*  to  about  20*.  Similar  acceptable  tolerances  may  be  obtained  for 
the  other  outputs  and  should  be  amenable  to  engineering  design. 


Optical  Implementations 


The  2-bit  multiplier  design  described  above  is  based  on  the  ability  of  optics  to  provide 
noninterfering  interconnections3  which  (1)  are  parallel  in  that  interconnection  time  is 
essentially  independent  of  interconnection  length  or  weight  and  which  (2)  lead  to  system 
operation  times  essentially  limited  only  by  the  response  times  of  sources  or  detectors. 

These  interconnections  may  be  provided  by  passive  diffracting  elements  in  the  form  of  an 
ordinary  or  bulk  thin  or  thick  film  hologram  in  which  light  propagates  approximately  normal 
to  the  hologram  plane.  These  interconnections  may  also  be  provided  in  integrated  optical 
implementations  by  passive  diffracting  elements  formed  on  or  near  a  substrate  surface  such 
that  light  propagates  approximately  parallel  to  the  surface.  Such  integrated  optical  imple¬ 
mentations  could  use  surface  relief  or  photorefractlve  mechanisms  to  form  the  diffracting 
elements  on  GaAs .  LiNb02,  glass  or  other  substrates. 


Integrated  optics  has  potsntlal  for  implementing  lumped  threshold  computation  modules 
with  superior  advantages  in  size,  power  consumption,  reliability,  etc.  This  technology  also 
has  potential  for  implementing  real-time  programmable  interconnections  or  weightings  using 
electro-optically  modulated  diffracting  element  structures  (or,  ultimately,  all-optical 
nonlinear  devices).  This  capability  would  be  important,  for  example  for  neural  network 
architectures  that  could  perform  "intelligent"  adaptive  and  symbolic  processing.4  Figure  5a 
shows  a  direct  implementation  of  programmable  interconnections  using  a  segmented  array 
electro-optic  grating.3  Here  the  interconnection  shown  in  Figure  3a  are  reordered  so 
that  no  connection  paths  cross  by  providing  a  uniform  optical  input  and  applying  one  of  two 
voltages  to  the  grating  segments  In  accordance  with  the  input  bits  x3.xo.y1.  and  y0.  Note 
that  individual  grating  segments  G^  are  tilted  at  angles  9j  so  that  parallel  input 
beams  j  are  directed  to  a  detector  with  threshold  T  after  weight  Wj  is  applied  as  deter¬ 
mined  by  programmable  voltages  V3.  However,  since  there  must  be  a  minimum  deflection  angle, 
the  lateral  separation  D  must  become  large  as  ths  number  of  segments  N  Increases.  Figure  5b 
shows  one  way  of  circumventing  this  problem  using  an  Integrated  optical  lens,  which  also 
permits  (1)  a  common  angle  *  for  all  grating  segments  and  (2)  the  elimination  at  a  stop  (or 
monitoring)  of  undiffracted  light.  Figure  6  shows  an  integrated  electro-optical  channel- 
guide  implementation.  Here  ths  channels  are  addressed  through  horns,  and  phase  modulation 
is  provided  by  surface  electrodee.  The  output  horns  terminate  in  multimode  regions  whose 
outputs  are  plane  wavee  which  impinge  on  a  surface  grating  at  the  Sragg  angle.  The  contri¬ 
butions  from  the  individual  horns  mix  in  ths  grating  and  produce  an  output  which  is  a  func¬ 
tion  of  the  phase  differences  between  all  pairs  of  input  beams.  Advantages  of  this 
arrangement  are  compactnese  and  isolation  of  the  detector  from  stray  light. 


Figure  7  showe  how  a  hologram  might  be  optically  generated  for  Implementing  certain 
input-output  relationships  or  truth  tables  (Including  the  2-blt  multiplier  truth  table)  in  a 
lumped  threshold  system.  One  possibility,  the  Fourier  transform  hologram  in  Figure  7a.  Is 
multiply-recorded  using  object  sources  0  related  to  output  truth  table  elements  and 
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reference  sources  a  related  to  input  truth  table  elements.  Mote  that  these  sources  need  not 
be  evenly  spaced  The  Fourier  transform  reconstruction  in  Figure  7b  generates  output  truth 
table  elements  A  given  input  truth  table  elements  C.  Using  standard  models  of  the 
holographic  process  it  may  be  shown  that  the  LxP  output  truth  table  matrix  A  is  related  to 
the  MxP  input  truth  table  matrix  C  by 

A  a  OR*C  i 2  ) 


where  0  and  R  are  LxM  and  NxM  matrices  describing  the  complex  amplitudes  used  in  recording 

the  M-fold  exposed  hologram,  m  =  1.2,...,  M,  p  »  1,2 . P,  and  ♦  is  the  conjugate  transpose 

operation.  An  important  aspect  of  Eg. (2)  is  that  although  many  exposures  may  be  used  to 
record  the  hologram,  the  ability  of  the  hologram  to  represent  input-output  relationships  is 
described  by  no  more  than  the  NL  complex  elements  of  OR*.  In  the  2-bit  multiplier,  for 
example,  where  N  *  L  *  4  and  P  »  16,  only  16  complex  parameters  are  available  to  relate  64 
input  bits  to  64  output  bits.  This  suggests  that  not  all  possible  truth  tables  are  reali¬ 
zable  in  an  optically  recorded  hologram  of  the  type  considered  here.  An  analogous  situation 
is  that  not  all  logic  functions  can  be  implemented  by  single  threshold  logic  elements.6 

It  would  be  useful  to  at  least  approximately  solve  Eq.(2)  for  OR*  in  terms  of  C  and  A. 
This  matrix  equation  is  generally  over  determined,  and  least-squares  or  pseudoinverse 
methods  might  be  used  to  obtain  an  approximate  solution.  The  (row  by  row)  least-squares 
solution,  for  example,  is 

OR*  -  AC*(CC*)_1  (3) 


While  this  solution  may  not  yield  the  desired  truth  table  realization  in  a  lumped  threshold 
system,  it  may  serve  as  a  starting  point  for  a  steepest  descent  or  other  computer  search  for 
desired  solutions.  Such  solutions  should  maintain  the  desired  input-output  relationship 
when  the  matrix  elements  are  varied  over  an  acceptable  tolerance  range.  The  geometrical 
optics  phase-only  solution  for  the  lumped  threshold  2-bit  multiplier  described  in  Figure 
3b  is  such  a  solution  and  may  be  used  to  derive  an  OR*  matrix  in  which  all  elements  have 
unit  magnitude: 


OR* 


1 

__  1 

<- yis  ♦  D/4 


(-1+/311/2 
{/IS  +  1 ) / 4 


1 

1 

{/Is  ♦ 

-1 


i)/4 


1 

f-1  -/?i)/2 
:-/Ts  *  1 ) / 4 
1 


(4) 


A  particular  implementation  of  this  solution  for  holographic  recording  li  0  »  the  4x4 
identity  matrix  and  R  -  (OR*)*.  Note  that  although  the  above  analysis  Implies  three- 
dimensional  holographic  systems,  integrate!  optical  assemblies  of  diffracting  elements  simi¬ 
lar  in  function  to  bulk  holograms  may  be  practical.  This  possibility  is  related  to  the 
observation  that  the  multiple  truth  table  "images"  to  be  recorded  and  reconstructed, 
although  often  highly  cross-correlated,  may  be  relatively  simple  or  low-resolution  bright- 
spot-dark-spot  patterns. 


General  Lumped  Threshold  Module  Synthesis 

Optical  or  computer  generated  hologram  synthesis  of  the  weighting  or  interconnecting 
units  required  for  lumped  threshold  computation  modules  will  generally  require  knowledge  of 
the  amplitude  and  phase  patterns  on  the  hologram  that  yield  the  correct  truth  table  input- 
output  behavior  with  maximum  weight  and  threshold  tolerances.  In  the  case  of  the  geometri¬ 
cal  optics  two-bit  multiplier  design  of  Figure  3,  expressions  governing  input-output 
behavior  were  easily  obtained.  This  favorable  situation  may  be  uncommon  in  the  design  of 
the  generally  smaller,  more  efficient,  etc.,  lumped  threshold  modules  for  which  geometrical 
optics  approximations  do  not  apply. 

Consider,  for  example,  the  derivation  of  eight  far-fleld  holograms,  each  with  the  same 
two  design  parameters,  that  implement,  in  a  lumped  threshold  system,  the  eight  positive- 
threshold  two-Boolean  variable  functions  (i.e.,  the  eight  out  of  the  sixteen  functions  for 
which  two  zero  inputs  yield  a  zero  output) .  Figure  8  shows  a  simple  format  consisting  of  a 
screen  with  two  pinholes  separated  by  a  distance  y.  One  pinhole  is  covered  by  a  phase- 
shifting  film  9:  there  is  a  detector  d  and  lower  and  upper  mutually  coherent  point  sourcee  I 
and  u.  In  the  far-fleld  approximation  the  distances  b  and  y  and  the  wavelength  \  *  2s/k 
must  be  small  compared  to  the  distance  s.  With  this  approximation  and  with  b  fixed,  the 
problem  reduces  to  finding  values  of  y  and  9  such  that  the  detected  signals  I|  for  only 
source  I  on.  Iu  for  only  source  u  on,  and  for  both  sources  on  have  all  six  possible 
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inequality  relationships.  Referring  to  Figure  a  for  definitions,  the  following  approximate 

expressions  may  be  derived: 


A  |  *  exp  l[k(w  *  s )  ]  *  exp  i[k(v  ♦  r)  »  e] 
Au  »  exp  i[k(w  ♦  si]  *  exp  l[k(u  +  r)  +  e] 


7!  * 

f  A,  |  2  * 

( ks ) 2 ( x2  *  ax ) 2 

+  2  o ( ks ) ( x2 

♦  ax )  +  n 2 

(5) 

II 

4-4 

|AU|2  = 

(ks)2(x2  -  ax)2 

+  2r?(ks)  (x2 

*  ax)  *  rj 2 

lb  =  |  A |  +  Au'|2  s  2(ks)2  f (x2  *  ax)2  *  (x2  -  ax)2] 
♦  8n(ks)x2  *  in2  -  4(ks)2(ax)2 


where  x  *  y/s,  a  ■  b/s,  and  n  *  8  -  n  s  0.  Figure  9  is  a  graph  of  the  approximate 
expressions  for  Ij,  Iu,  and  1^  versus  x  for  A  ■  628  nm,  b  »  10  urn,  s  =  10  cm,  and  n  =  .304. 
Note  that  four  of  the  six  inequality  relationships  can  be  satisfied  using  the  plotted 
values;  the  other  two  relationships  can  be  satisfied  for  other  values  of  o- 

The  example  of  Figure  9  and  Eqs.  (S)  indicates  the  possible  complexity  of  general 
(physical  optics)  lumped  threshold  module  synthesis.  Greater  complexity  may  be  anticipated 
if  Fresnel  rather  than  Fraunhofer  diffraction  conditions  are  allowed  and  if  the  input-output 
truth  tables  are  large.  One  approach  to  such  synthesis  problems  is  to  perform  additional 
post-photodetection  processing  and  to  employ  logical  reduction  and  residue  arithmetic  tech¬ 
niques  to  reduce  the  effective  size  of  the  truth  tables  to  be  realized.7  The  approach  con¬ 
sidered  here  seeks  alternatives  to  requirements  for  conversion  into  and  out  of  residue 
arithmetic  and  for  additional  all-electronic  processing. 

Further  work  on  lumped  threshold  logic  should  emphasize  studies  of  the  types  and  sizes  of 
realizable  truth  tables  and  should  seek  general  methods  for  synthesizing  holograms  such  that 
the  required  truth  tables  are  realized  with  acceptable  threshold  and  other  tolerances.  A 
straightforward  but  perhaps  limited  approach  to  these  studies  is  to  investigate  the  number 
of  holrgrams  with  specified  resolution  and  cross  correlation  characteristics  that  can  be 
multiplexed  on  a  single  recording  medium.  A  more  general  approach  may  be  to  obtain 
expressions  -  generally  large  sets  of  overdetermlned  simultaneous  nonlinear  Inequalities  - 
that  fully  describe  a  desired  optically  Implemented  lumped  threshold  module  and  to  find 
optimal  solutions  for  them  using  nonlinear  programming  techniques.  This  approach  will  pro¬ 
bably  require  the  use  of  supercomputer  facilities,  but  in  many  cases  it  may  be  the  best 
approach  for  obtaining  designs  with  optimum  performance  characteristics. 
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Figure  2.  2-bit  multiplier  truth  table.  The  12th  row  of  the  input  and  of  the  z2  column 
boxed . 
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Figure  3.  Lumped  threshold  2-bit  multiplier.  (a)  Interconnections  from  sources  x  to  detec 
tors  z.  (b)  no-attenuation  solution  for  interconnection  phases  S,  detection 
thresholds  T,  and  threshold  tolerances  AT/R. 
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Figure  6. 


Integrated  electro-optic  channel  array*  for  programmable  threshold  logic. 
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Figure  7.  Holographic  synthesis  of  lumped  threshold  logic. (a)  Recording,  (b)  reconstruction 
(truth  table  readout). 


7-101 


SPI£  Vol  564  Pttl  Timt  Signal Proctssing  VIII II 965 1  '65 


•a^'a 


HOLOGRAPHIC  WEIGHTING  AND 
PHASE  CONJUGATION  FOR  EXTERNAL 
THRESHOLDING  ARCHITECTURES 


Gordon  R.  Little 
University  of  Dayton 
Research  Institute 
Dayton,  Ohio  45469 


Abstract 


Possible  holographic  implementations  of  the  weighting  opera¬ 
tions  required  for  external  thresholding  architectures  are 
reviewed  and  extended,  particularly  for  the  coherent  source, 
complex  weight  case.  A  four-input-bit  experimental  effort  is 
described  for  this  case  that  includes  an  improved  phase 
stabilization/control  scheme.  The  possible  use  of  optical  phase 
conjugation  for  significantly  improving  the  performance  of  cer¬ 
tain  external  thresholding  architectures  is  also  discussed. 
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Holographic  Weighting  for  External  Thresholding 

Architectures 

A  previous  investigation  has  considered  the  use  of 
holographic  interconnects  to  implement  the  weighting  operations 
in  threshold  logic.1  This  work  included  an  analysis  of 
holographic  weighting  for  the  coherent  source,  complex  weight 
case.  A  review  and  extension  of  this  work  is  considered  below. 

A  threshold  logic  element  is  taken  to  have  N  binary  input 
variables  {an}  n=l,  N  and  a  single  binary  output  variable  d 
given  by 

d  =  T(Z  wn  an)  ,  (1) 

n 

where  T  is  a  thresholding  operation  and  wn  is  the  set  of  weights 
which  are  applied  to  the  input  variables.  Defining  c  as  the 
weighted  sum  of  the  input  variables, 

c  a  *  wn  an  ,  (2) 

n 

and  assuming  real  weights,  the  output  variable  can  be  written 


d  =■  u ( c  -  t)  ,  (3) 

where  u  is  the  Heaviside  step  function 

0  x  <  0 

u(x)  »  ,  (4) 

1  x  >  0 

and  t  is  the  threshold.  Note  that  Equation  (2)  describes  the 
inner  product  between  a  weight  vector  w  and  an  input  vector  a. 

In  a  coherent  optical  implementation,  the  weights  and  the 
weighted  sum  will,  in  general,  be  complex.  In  this  case,  a 
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number  of  thresholding  schemes  are  possible,  depending  on  how  one 
chooses  to  partition  the  output  space  (the  complex  plane).  The 
present  investigation  adopts  a  thresholding  scheme  based  on  the 
modulus  squared  of  the  weighted  sum. 
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This  scheme  is  appropriate  if  thresholding  is  accomplished 
electronically  following  standard  photodetection. 

In  general,  there  will  be  many  different  input  vectors  which 
may  be  presented  to  the  logic  element,  each  with  a  specific 
desired  output.  In  addition,  we  may  wish  to  implement  several 
logic  functions  in  parallel.  Denoting  the  number  of  input  vec¬ 
tors  by  P,  and  the  number  of  logic  functions  by  M,  we  can  express 
the  weighted  sum  for  the  composite  system  by  the  matrix  equation 

C  *  WA  (6) 

Here  A  is  an  N  x  P  matrix  whose  columns  represent  the  P  input 
vectors,  W  is  an  M  x  N  matrix  whose  rows  represent  the  weight 
vectors  for  the  M  logic  functions,  and  C  is  an  M  x  P  matrix  whose 
columns  represent  the  M  outputs  for  each  of  the  P  input  vectors. 

N 

Note  that  if  all  possible  input  vectors  are  permitted,  then  P  =  2  . 

A  fundamental  step  in  the  design  of  a  threshold  logic  ele¬ 
ment  is  the  determination  of  the  weight  and  threshold  values 
which  yield  the  desired  output.  For  each  of  the  M  logic  func¬ 
tions  there  are  P  inequalities  in  N+l  unknowns  which  must  be 
solved.  If  the  weights  are  real  valued,  the  inequalities  are 
linear, but  if  the  weights  are  complex  and  if  the  thresholding 
scheme  described  by  Equation  5  is  used,  then  the  inequalities  are 
nonlinear.  In  either  case,  the  determination  of  weights  and 
thresholds  which  provide  acceptable  tolerances  is  a  non-trivial 
problem.  For  a  fully  populated  input  space  (i.e.  P  =  2N) ,  and  for 
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N  >  3  appropriate  solutions  for  specific  logic  functions  may  not 
exist .  2 

A  possible  holographic  implementation  of  the  weighting  in  a 
threshold  logic  element  is  depicted  in  Figure  lb.  Here,  the 
binary  inputs  are  a  set  of  N  mutually  coherent  point  sources 
located  in  the  front  focal  plane  of  the  input  lens  and  the 
weighted  sums  for  the  M  outputs  are  represented  by  a  set  of  M 
spots  in  the  back  focal  plane  of  the  output  lens.  Regarding  the 
outputs  as  a  generalized  reconstruction  of  a  set  of  object  sources, 
a  potential  scheme  for  fabricating  the  weighting  hologram  would 
be  to  superimpose  a  number  of  exposures  from  an  array  of  M  object 
sources  and  N  reference  sources  as  shown  in  Figure  la.  It  is  this 
approach  which  has  been  analyzed. 

To  obtain  equations  describing  the  system  the  hologram  is 
taken  to  be  a  thin,  amplitude  hologram  and  the  Fourier  transform 
geometry  shown  in  Figure  3b  is  assumed.  Taking  Rni  and  0mj  to 
represent  the  complex  amplitudes  used  in  the  1th  exposure  of  the 
nth  reference  source  and  the  mth  object  source,  the  transmittance 
T  of  the  hologram  can  be  written 

T  =  a  E  I  E  R  elkn’r  +  z  oml  e^m^  |2  (7) 

*  n  nl  m 

where  kn  and  are  wave  vectors  of  the  reference  and  object 
beams  incident  to  the  holographic  recording  medium,  r  is  the 
position  vector  in  the  medium,  and  a  is  an  unimportant  constant. 

If  the  reconstruction  sources  are  positioned  at  the  reference 
source  locations,  the  complex  optical  amplitude  transmitted  by 
the  hologram,  <|/(r),  is 

ikn  •  r 
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Of  the  N ( N+M )  plane  wave  terms  in  Equation  (9)  only  the  M 
waves  in  the  third  line  for  which  n1  ■  n  are  of  interest.  It  is 
these  waves  which  form  the  reconstructed  Images  of  the  object 
points.  It  is  seen  that  these  terms  can  be  written 
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The  amplitudes  of  the  M  plane  waves  described  by  Equation  (10)  are 
the  amplitudes  of  the  reconstructed  objects.  It  is  seen  that  the 
action  of  the  multiple  exposure  hologram  can  be  expressed  in  a 
matrix  equation 


C  *  ORt  A  ,  (11) 

where  t  is  the  Hermltian  transpose  operation  and  0  and  R  are 
matrices  whose  columns  represent  the  complex  amplitudes  of  the 
object  and  reference  sources  used  in  each  exposure.  Comparing 
Equations  6  and  11  it  is  seen  that  the  weight  matrix  and  the 
hologram  matrices  are  related  by  W  -  ORt,  jt  should  be  noted 
that  the  use  of  thick  phase  holograms  along  with  careful  source 
geometry  selection  will  reduce  the  strength  of  the  undesired 
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terms  to  low  levels;  the  output  wave  vectors  for  these  terms 
will  not  satisfy  the  Bragg  conditions. 

From  this  analysis,  it  is  seen  that  if  the  sets  of  weights 
which  realize  specific  sets  of  threshold  logic  functions  are 
known,  then  a  holographic  implementation  of  the  weighting  can  be 
designed.  In  fact,  there  are  many  possible  designs,  depending  on 
how  the  weight  matrix  is  factored.  A  factorization  which  empha¬ 
sizes  the  independence  of  the  M  logic  functions  and  which  Imposes 
minimal  phase  control  constraints  in  fabrication  is  to  let  0  be 
the  MxM  Identity  matrix  and  let  R*  be  the  MXN  weight  matrix. 

Then  there  would  be  M  exposures,  each  writing  a  separate  logic 
function  onto  the  hologram.  Since  the  number  of  elements  in  the 
weight  matrix  is  limited,  a  reasonably  low  number  of  hologram 
exposures  is  required  to  Implement  Interesting,  realizable 
threshold  logic  functions.  A  full  precision  16-bit  multiplier, 
if  realizable,  would  require  only  32  exposures.  The  content 

addressable  memory  approach, 3  on  the  other  hand,  requires  -1010 
exposures  if  standard  arithmetic  is  used  and  -103  exposures  if 

logical  reduction  techniques  and  residue  arithmetic  are  used. 
Critical  questions  which  remain,  however.  Include  the  realizabil¬ 
ity  of  such  high  bit-number  arithmetic  operations  and  the 
tolerance  of  such  a  scheme  to  phase  and  amplitude  errors  in 
fabrication  and  readout.  It  should  be  noted  that  there  are  many 
interesting  problems  in  such  areas  as  target  recognition  and 
tracking  where  the  input  space  is  limited  (P  <<  2N) ,  making 
realizability  of  a  given  threshold  logic  function  much  more 
likely. 

Experimental  Effort  on  External  Thresholding  Architectures 

To  investigate  the  concepts  derived  from  the  analysis  and 
to  explore  the  problems  involved  in  reducing  the  holographic 
weighting  scheme  to  practice,  a  limited  experimental  effort  was 
carried  out.  This  effort  was  restricted  to  threshold  logic 
functions  of  four  input  bits  and  was  specifically  focused  on  the 
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four  functions  representing  the  output  bits  from  a  two-bit 
multiplier.  All  of  these  functions  are  realizable  using  complex 
weights  but  only  three  are  realizable  with  real  weights. 

Clearly,  the  relative  phases  of  the  sources  used  in  fabri¬ 
cating  the  weighting  hologram  must  be  controlled  and  the  preci¬ 
sion  of  control  is  dependent  on  the  tolerance  requirements  in  the 
thresholding  operation.  Preliminary  experiments  also  indicated 
that  thermally  Induced  phase  drifts  in  the  laboratory  were  suf¬ 
ficiently  large  to  present  significant  difficulty  in  carrying  out 
the  experiments.  Accordingly,  an  active  phase  stabilization/ 
control  scheme  was  devised.  The  scheme  described  by  MacQulgg,4 
was  modified  to  improve  the  phase  control  Drecislon  from  ~10°  to 
~1°.  The  phase  stablllzatlon/control  system  as  applied  to  two 
sources  is  depicted  in  Figure  2.  A  phase  control  grating  is 
fabricated  using  the  two  sources  and  is  rotated  to  produce  a 
fringe  pattern  on  a  detector.  The  phase  of  one  of  the  sources  is 
dithered  at  the  reference  oscillator  frequency  of  a  lock-in 
amplifier  to  produce  a  small  motion  of  the  fringes.  The 
resultant  oscillatory  detector  output  is  fed  into  the  lock-in 
amplifier  and  the  lock-in  amplifier  output  serves  as  the  error 
signal  to  stabilize  the  DC  phase  of  the  dithered  source  relative 
to  the  other  source.  Phase  control  is  achieved  by  translating 
the  detector  within  the  fringe  pattern.  Methods  based  on  this 
scheme5  may  ultimately  be  important  in  high-speed  optical  com¬ 
puting  (perhaps  in  Integrated  optical  implementations)  where 
mutually  coherent  sources  are  used. 

The  layout  that  has  been  constructed  for  the  2-bit 
multiplier  experiment  is  shown  in  Figure  3.  A  prism  arrangement 
is  used  to  generate  the  eight  sources  required  to  construct  the 
weighting  hologram,  and  piezo-electric  driven  mirrors  are  used  to 
Implement  the  phase  stabilization/control.  Using  the  hologram 
design  where  Rt  i8  the  weight  matrix,  only  three  control  loops 
for  the  reference  sources  are  needed. 
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The  primary  goal  of  the  present  experimental  effort  was  to 
demonstrate  that  optically  fabricated  holograms  could  be  used  for 
complex  valued  weighting  in  simple  threshold  logic  modules.  One 
question  raised  in  the  course  of  the  effort  is  whether  processing 
Induced  distortions  in  the  weighting  hologram  will  significantly 
degrade  performance.  If  so,  the  number  of  potentially  useful 
hologram  fabrication  techniques  may  be  limited.  Other  questions 
which  remain  to  be  investigated  Include  the  degree  of  threshold 
tolerance  which  is  achievable  and  the  precision  requirements  for 
phase  and  amplitude  control  of  the  sources  during  fabrication  and 
read  out . 

An  analysis  of  the  realizability  of  logic  functions  within 
the  holographic  weighting  context  and  an  examination  of  the 
Impact  of  realizability  on  architectures  are  fundamentally  impor¬ 
tant.  Topics  Include  possible  improvements  in  realizability 
using  nonlinear  hologram  recording  and  alternative  thresholding 
techniques  for  coherent  weighting.  The  set  of  logic  functions 
involved  in  multiplying  two  4-bit  numbers  may  be  viewed  as  a 
functional  module,  and  combinatorial  logic  architectures  which 
use  such  modules  to  perform  higher  order  operations  merit  further 
study . 


The  holographic  weighting  schemes  investigated  thus  far 
have  involved  spreading  each  weight  element  over  the  entire 
hologram.  Improved  performance  may  be  realized  if  the  weight 
elements  are  separated.  An  analysis  of  this  approach  would 
Include  treatment  of  storage  density  questions,  examination  of 
computer  generated  holograms  for  coherent  and  Incoherent 
weighting  schemes,  and  design  of  simple  architectures. 

Most  of  the  analytical  and  experimental  work  performed  to 
date  has  concentrated  on  thin  amplitude  holograms  based,  for 
example,  on  silver  halide  plate  technology.  However,  achievement 
of  the  storage  densities  necessary  for  practical  optical  com¬ 
puting  will  probably  require  the  use  of  thick  phase  holograms  and 
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photoref ractive  material  technology.  Such  holograms  have  the 
advantage  that  no  processing  (i.e.,  developing)  is  required,  and 
thus  processing- induced  distortions  would  be  nonexistent.  It  is 
therefore  important  that  the  special  features,  capabilities,  and 
constraints  implied  by  the  use  of  photo-refractive  materials  in 
holographic  weighting  be  considered.  Such  consideration  would 
include  the  analysis  of  thick  phase  holograms  for  coherent 
weighting  and  the  design  of  architectures  to  utilize  multiple 
passes  through  the  hologram  for  Implementing  selected  logic  func¬ 
tions  . 

Phase  Conjugation  for  External  Thresholding  Architectures 


Optical  phase  conjugation  (OPC)  is  an  important  emerging 
technology  which  may  find  wide  application  in  optical  computing 
and  signal  processing.  Presently  identified  OPC  capabilities  of 
Interest  Include  phase  distortion  correction,  lensless  imaging, 
gain,  and  use  in  associative  memories. 6, 7  it  is  possible 
that  eventual  implementations  of  the  popular  circulating  packet 
architectures  may  Include  phase  conjugate  devices  for  level 
restoration  (gain) ,  for  module  interconnection  and  input/output 
(lensless  imaging),  and  for  prefiltering  (associative  memories).8 

Of  immediate  Interest  is  the  recent  demonstration  by 
Dunning  et  al.8  of  am  all-optical  associative  memory.  In  this 
scheme,  depicted  in  Pigure  4,10  a  holographic  memory  is  placed 
within  a  resonant  cavity  bounded  by  phase  conjugate  mirrors.  A 
pattern  is  Introduced  to  the  holographic  memory  element  using  a 
beam  splitter,  and  several  reconstructed  reference  beams  are  pro¬ 
duced  by  the  hologram,  depending  on  the  degree  of  match  between 
the  input  pattern  and  the  stored  patterns.  A  lens  images  the 
hologram  onto  one  of  the  phase  conjugate  mirrors,  which  has  a 
threshold  level  for  conjugation.  The  threshold  level  is  set  so 
that  only  the  strongest  reconstructed  reference  beam  resonates  in 
the  cavity,  and  the  system  output  is  the  stored  pattern  which 
best  matches  the  input . 
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Two  aspects  of  this  scheme  appear  to  be  directly  applicable 
to  external  thresholding  architectures.  First,  note  that  if  the 
resonator  output  is  taken  from  the  other  side  of  the  holographic 
memory,  the  system  becomes  a  look-up  device,  i.e.,  a  content- 
addressable  memory  whose  performance  may  be  significantly 
improved  by  the  thresholding  action  of  the  resonator  cavity.  It 
may  be  possible,  for  example,  to  store  many  more  patterns  within 
the  memory  than  would  be  possible  without  the  resonator.  Second, 
note  that  the  resonator  could  be  used  to  form  a  thresholding 
device.  In  this  case,  the  hologram  and  lens  could  be  eliminated 
from  the  system  and  the  input  could  be  a  spatially  modulated  beam 
of  light.  In  areas  where  the  input  intensity  is  above  threshold, 
the  system  would  resonate  and  the  output  would  be  high,  but  in 
areas  where  the  input  intensity  is  low,  the  system  would  not 
resonate  and  the  output  would  be  low.  Depending  on  the  charac¬ 
teristics  of  the  thresholding  phase  conjugate  mirror,  significant 
improvement  over  other  thresholding  schemes  may  result. 
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Figure  1.  Holographic  implementation  of  weighting  in  threshold 
logic:  (a)  recording,  (b)  reconstructing. 


Figure  2-  Phase  stabilization/control  scheme. 
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Abstract 


The  potential  of  electro-opt ical ly  implemented  Grossberg 
neural  networks  is  assessed.  These  networks  can  perform  adaptive 
pattern  recognition  using  both  short-and  long-term  memory:  long¬ 
term  memory  stores  feature  vectors  corresponding  to  many  target 
patterns;  short-term  memory  adaptively  tracks  the  evolution  of 
these  feature  vectors.  The  possible  design,  fabrication,  and 
testing  of  an  experimental  neural  network  simulator  system  which 
contains  an  electro-optic  linear  algebra  processor  as  a  major 
component  is  also  assessed.  Compared  to  current  processors  based 
on  Kalman  filtering  or  related  techniques,  this  type  network 
should  form  the  basis  for  an  adaptive  multi-sensor  processor 
having  a  high  order  of  "intelligence"  for  pattern  recognition  and 
tracking  tasks. 
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Introduction 


The  neural  network  models  suggested  by  Grossberg,  [1976a, 
1976b,  1980]  have  potential  to  perform  complex  pattern  recogni¬ 
tion  and  tracking  tasks  required  by  strategic  defense  sur¬ 
veillance,  acquisition,  and  tracking  scenarios.  These  networks 
will  have  generic  applicability  to  many  problem  areas  involving 
similar  mathematical  structures.  For  the  SDI  a  likely  con¬ 
figuration  would  involve  output  feature  values  from  several  sen¬ 
sor  preprocessing  systems  forming  the  input  vector  to  the 
network.  The  network  type  assessed  in  this  paper  can  perform 
adaptive  pattern  recognition  and  has  both  short-  and  long-term 
memory,  which  distinguishes  it  from  other  notable  models 
[Hopfield,  1982;  Willshaw,  1981;  Kohonen,  1983].  The  long-term 
memory  (LTM)  is  of  the  correlation  type  [Grossberg,  1976a,  1976b, 
1981;  Kohonen,  1981,  1983]  and  can  store  many  reference  patterns 
representing  feature  vectors  corresponding  to  specific  threat  or 
target  patterns.  The  short-term  memory  (STM)  is  adaptive  and  can 
track  the  evolution  of  feature  vectors. 

A  key  aspect  of  assessment  is  the  implementation  of  a  high- 
level  control  structure  in  the  network.  Such  a  structure  would 
constrain  the  adaptation  range  of  the  lower  levels  to  prevent 
tracking  of  patterns  beyond  application-dependent  acceptable 
bounds.  This  combination  of  features  comprises  the  basis  for  an 
adaptive  multisensor  control  and  processing  system  which  should 
have  an  "Intelligence"  of  a  higher  order  than  that  obtained  with 
current  techniques  such  as  Kalman  filtering.  The  use  of  optical 
algebraic  processing  as  a  major  part  of  an  experimental  neural 
simulator  which  could  be  exercised  to  study  network  behavior  is 
also  assessed.  The  remaining  sections  of  this  paper  thus  pre¬ 
sent  the  mathematical  basis  of  the  model  and  describe  a  possible 
hybrid  optical/digital  experimental  system. 
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At  some  instant  t  we  are  given  an  Nxl  input  vector  u(t)  and 
a  set  of  M  normalized  Nxl  pattern  vectors  {j>o(t),  Ei(t),  ••• 

Pm_ i ( t ) ) .  We  wish  to  know  whether  u(t)  contains  one  of  the  patterns 
Ei(t)  and  if  so  which  one.  Since  the  patterns  themselves  may  be 
slowly  time-varying,  we  wish  to  have  our  system  reflect  these 
slow  changes  as  well  as  track  the  input  vector  u(t).  Let  u(t) 
have  the  structure  [uQ(t),  ujjt),  . . . ujj_i ( t ) ] T .  Because  the 
individual  components  uj(t)  and  uj(t)  are  possibly  correlated  in 
ways  that  do  not  represent  any  of  the  desired  patterns  { pm ( t ) }  we 
wish  to  construct  a  state  vector  x(t)  =  [xo(t),  ...  xjj-i(t)]T 

whose  components  xj(t)  and  xj(t),  i,  j  =  0  ..N-l,  are  correlated 
in  a  way  which  emphasizes  some  one  of  the  desired  patterns.  In 
other  words,  noisy,  unwanted  correlations  between  individual 
components  of  u(t)  are  suppressed  in  the  transformation  T  :  u(t) 

-  x(t).  It  is  required  that  the  state  vector  x(t)  be  based  on 
the  input,  and  the  system  is  to  find  patterns  inherent  in  u(t) 
and  track  them.  Since  x(t)  could  lock  onto  a  false  pattern, 
i.e.,  one  not  contained  in  ( pm ( t ) I  it  is  desired  that  x(t)  be 
monitored  by  the  system  for  validation  of  its  tracking  activity. 

For  example:  The  input  signals  may  become  distorted  through 
momentary,  unexpected  interference  that  could  produce  moving  ghost 
reflections.  This  may  cause  x(t)  to  consist  of  aspects 
distributed  across  two  or  more  patterns  { pm ( t ) )  in  such  a  way 
that  a  hybrid  pattern  g(t)  is  represented,  and  short-term  memory 
may  begin  to  track  this  hybrid.  The  monitoring  function  guards 
against  this  kind  of  instability  by  comparing  the  adapted 
short-term  pattern  estimate  x(t)  against  the  set  of  known 
patterns  (Em^U- 

A  neural  net  based  on  the  Grossberg  model  is  suggested, 
to  address  this  problem.  The  model  is  characterized  by  the 
equations 


(1) 


x^(t)  =  -  a  x^(t)  +  (fi  -  Xi(t)Hfi(xi(t))  +  u^(t)  j 


-  U  +  x  (t) )  l  [f  (x  (t) )  +  u  (t)] 

i  k*i  k  k  k 


V'1  ■ 


-  v  Mij  +  P  x^tjx^t) 


(i) 

if  its  (t)-p  (t)  II <e  for  all  m 

i  m 


[o. 


otherwise 


(2) 


N-l 


s  (t)  =  l  M  (t)  x  (t) 


j=0 


U 


(3) 


for  i,  j  =  0,..,  N-l  [Grossberg  1975,  1976a,  1977;  Cohen,  1983] 
and  as  shown  in  Figure  1,  These  equations  Implement  a  short-term 
memory  in  the  vector  x(t)  and  a  long-term  memory  by  means  of  a 
"connectivity  matrix"  M  [Elllas  &  Grossberg,  1975;  Grossberg, 

1976a  and  1976b,  1980,  1981].  The  short-term  memory  acquires  its 
character  through  choice  of  the  function  f(w)  and  allows  the 
options  of  contrast  enhancement,  pattern  selection  and  matching, 
or  noise  suppression.  A  sigmoid  function,  e.g.,  f(w)  oc  [arctanw]p 
has  been  shown  to  combine  these  options  over  ranges  of  w  where 
they  are  most  needed,  i.e.,  emphasizing  contrasts  when  w  is  very 
weak,  pattern  selection  and  replication  when  w  is  of  moderate 
intensity,  and  staying  below  some  saturation  level  when  spikes 
occur  [Grossberg,  1975] .  The  short-term  memory  has  relatively 
fast  dynamics  and  tracks  some  pattern  in  the  input,  but  it  is 
blind  to  the  identity  of  this  pattern. 


The  matrix  M  corresponds  to  the  long-term  memory  of  the 
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system.  It  acts  as  a  monitor  for  the  tracking  part  of  the  system 
and  can  also  adapt  to  reflect  slow  changes  in  stationarity  of  the 
patterns  {pm(t)}.  Initially  M  =  Po(t)e(t)T  +  Ei(t)pi(t)T  +  ... 

+  where  M  <<  N.  Let  us  suppose  for  ease  of  argument 

that  the  {pi(t)|  are  orthonormal,  i.e.,  Pi(t)T£j(t)  =  where  6 

is  the  Kronecker  delta.  Then  we  see  that  M  pm(t)  =  [Eo ( t )po ( t ) T 
+  +  Em(t)em(t)T  +  ...  +  eM(t)pM(t)T]pm(t)  =  Em(t)-  It  has 

been  shown  that  if  a  vector  y(t)  is  not  identical  to  any  of  the 
(Em(t)}  then  the  operation  M  y(t)will  produce  a  vector  which  lies 
"closest"  to  one  of  the  pm(t)  [Kohonen  1981,  1983,  Grossberg,  1976a] 
The  degree  of  acceptability  of  this  closeness  may  be  determined 
as  follows:  We  form  the  output  vector  s(t)  =  M  x(t).  If  the 
short-term  memory  vector  x(t)is  tracking  one  of  the  desired  pat¬ 
terns  Em(t),  i.e.,  if  En(t)  has  been  detected  as  present  in  u(t) 
then  II  s(t)  -  Em(t)  11  <  €  for  some  e  and  some  m  selected  from 
{ 0 , . . . , N- 1 ) ,  i.e.,  s(t)  is  "close"  to  pm(t).  If  this  closeness 
requirement  is  met,  we  update  the  matrix  M.  Otherwise  we  reset 
x(t)  and  begin  the  search  for  another  pattern. 


Coupling  of  Slow  and  Fast  Dynamics 

The  monitoring  of  the  fast  tracker,  STM,  by  the  long-term 
memory  component,  LTM ,  requires  that  the  dynamics  of  both  be 
coupled  into  a  larger  system.  When  the  fast  and  slow  parts  of  a 
system  are  tightly  coupled,  boundary  layer  instabilities  arise 
such  as  are  found  in  turbulence,  shears,  etc.  To  avoid  these 
phenomena,  tightly  coupled  systems  are  often  accompanied  by 
restrictions  which  allow  only  one  part  of  the  system  to  make  a 
transition  at  a  given  instant  in  time.  An  example  is  a  model 
which  construes  Equation  (2)  as  a  running  time  estimation  of  an 
underlying  wide-sense  stationary  process.  The  model  featured 
here  avoids  many  of  these  restrictions  by  maintaining  a  "loose" 
coupling  of  the  two  time  scales. 

To  clarify  this  point  further,  note  that  Equation  (1)  does 
not  contain  an  explicit  reference  to  LTM.  It  is  worth  noting 


that  classical  approaches  have  frequently  maintained  a  tight 
coupling  of  the  two  time  scales,  eg.,  in  recursive  identification 
techniques  and  in  state  variable  feedback  schemes. 


$ 


i 


The  Role  of  LTM  and  STM  in  Recursive  Identification  Systems 

Recursive  identification  schemes  typically  take  as  their 
point  of  departure  some  variant  of  Wiener  filtering  theory  or 
mean-square  estimation.  The  signal  model  is  that  the  input  u^t) 
contains  a  signal  p^(t)  and  a  noise  term  n^(t):  u^(t)  =  p^t)  +  n^(t) 
The  input  signal  u^(t)  thus  is  presumed  to  contain  a  deter¬ 
ministic  part  p^(t),  such  that  knowledge  of  neighboring  values 
pj(t)  allows  us  to  construct  p^t),  for  all  j,i.  However,  u^(t) 
also  contains  a  part  n^(t)  which  randomly  fluctuates  and  for 
which  knowledge  of  its  values  at  any  and  all  other  positions  j 
and  times  t-r  are  of  no  help  in  estimating  n^t).  Viewed  separ¬ 
ately,  n^ft)  has  instantaneously  fast  dynamics  whereas  p^(t)  has 
relatively  slower  dynamics.  Thus,  u^t)  is  some  sort  of  mixed 
process  whose  deterministic  part  may  be  estimated  by  a  param¬ 
eterized  process  as  follows: 

p(t)=  X  x  (t)  u  (t) 

1  j*i  J  i 

where  x(t)  is  a  parameter  vector.  If  we  know  x(t)  then  we  can 
form  the  above  estimate  and  at  least  approximately  reconstruct  the 
signal  p^(t).  Wiener  filtering  theory  does  this  by  requiring 
that  x(t)  be  chosen  such  that  the  mean  squared  error 
€i  =  ~  P^(t)l  be  minimized.  The  result  of  this  minimization 

process  is  the  familiar  Wiener-Hopf  equation  (Papoulis,  1984) 

bi(t)  - 1  V1’31  V'1 

where 


b . ( t )  =  E(p. (t)u. (t) )  and  R  (i,j) 


E  (u.(t)u.(t)) 


If  u(t)  =  p(t)  +  n(t)  is  wide  sense  stationary  and  if  n(t)  is 
a  spatially  uncorrelated  noise  process  then 

E(Ul(t)Uj(t) )  =  Ruu(i,j)  ~  E£T  =  R 

for  some  stationary  pattern  p.  Hence,  the  vector  equation  b  =  Rx 

has  the  solution  x  =  R'^-b.  To  estimate  this  recursively  a 

T 

running  time  estimate  R(t)  =  R(t-l)  +  p  u(t)  u(t)  can  be  formed 
and  then  x(t)  =  x(t-l)  +  p  R-1(t)  b(t)  can  be  computed  where  b(t) 
is  either  known  a  priori  or  estimated,  e.g.,  b(t)  =  x(t-l) 
x(t-l)  u(t).  There  are  many  variants  of  this  general  procedure. 
(Ljung,  1983;  Monzingo  and  Miller,  1980).  Note  that  R  plays  the 
role  of  LTM  and  that  its  dynamics  are  by  hypothesis  confined  to 
functional  variations  within  a  temporal  ensemble.  Although  this 
model  does  not  cleanly  separate  LTM  (R)  and  STM  (x)  in  terms  of 
slow  and  fast  dynamics,  the  rates  of  convergence  to  steady  state 
of  the  x^  are  known  to  be  inversely  proportional  to  the  largest 
eigenvalue  of  some  ideal  R.  It  is  clear  that  x  cannot  reach  a 
steady  state  neighborhood  if  IIR(t)  -  R ( t  —  1 )  II  does  not  become 
small  for  large  t.  This  can  only  be  guaranteed  by  placing 
assumptions  on  the  model,  e.g.,  the  process  is  ergodic. 

The  analogue  to  this  type  of  recursive  identification  proce¬ 
dure  for  the  proposed  neural  net  model  is  to  replace  Equation  (1) 
with  an  equation  of  the  form 

N-l 

x  (t)  =  -  a  x  (t)  +  x  (t)  1  Q  (t)  u  (t)  (la) 

1  1  1  j=0  J 

where  2  =  ??  1  and  Equation  (2)  with 

Mjjft)  =  -  a  Mij(t)  +  p  u1(t)  Uj(t)  .  (2a) 

Note  that  whereas  the  recursive  identification  schemes  use  LTM  to 
store  one  pattern,  the  neural  net  model  can  store  a  multiplicity 
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of  patterns.  In  other  words  the  parameter  vector  x(t)  in  the 
recursive  identification  scheme  plays  the  role  of  a  retrieval  key 
whereas  in  the  neural  net  model  LTM  is  now  hetero-associative  and 
initialized  e.g. ,  by  M  =  xqT  +  ...  instead  of  by  M  =  pQ  pQT  +  . . .  . 

In  Equation  (2)  LTM  is  allowed  to  decay  slowly  or  to  be 
reinforced  by  patterns  which  prevail  in  STM,  provided  they  are 
sufficiently  close  to  one  of  the  initial  patterns  { p  A  > .  Some 
variants  of  Equation  (2)  are 

M1;j(t)  =  -  Vj  M1J(t)  +  p  s1(t)  Xj(t)  (2b) 

and 

j  ( t )  =  -Vj  M^ft)  +  P  s^t)  s^(t)  (2c) 

which  guarantee  that  LTM  is  reinforced  by  patterns  which  are  pro¬ 
jections  onto  some  of  the  <po>  since  s(t)  =  M(t)x(t).  On  the 
other  hand,  we  might  want  to  allow  LTM  to  slowly  acquire 
unforeseen  patterns.  Thus  an  entirely  new  and  unanticipated  pattern 
will  be  recorded  and  monitored  by  the  system  provided  that  it  is 
persistent  enough.  Such  a  situation  would  be  reflected  in 

Mi;J(t)  =  -u  M1;J(t)  +  p  xA  ( t )  x  j  ( t )  .  (2d) 

Finally,  the  neural  net  could  make  use  of  two  LTM's,  one  which 
has  a  permanent  set  of  patterns  <2m)  and  a  second  which  is 
allowed  to  slowly  acquire  STM  patterns.  Such  a  system  would  have 
a  permanent  record  of  its  initial  patterns  but  would  have  the 
capability  of  responding  to  unforeseen  situations.  In  addition 
the  system  could  activate  one  set  of  "routine"  control  signals 
when  one  of  the  original  patterns  is  recognized  and  activate  a 
second  set  of  control  signals  when  a  new  and  persistant  pattern 
emerges.  This  might  be  schematized  as  shown  in  Figure  2. 

The  equations  for  the  network  are  then 
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(1)  T  T 

M  (0)  s  E  E  +  E  E  + 
o  o  11 


*  ¥m  ' 


(2) 

M  (0)  =  0 


•U)  .(2)  T 

M  (t)  =  0  ,  M  (t)  -  -  v  M  (t)  +  p  x(t)  x(t) 

The  dynamics  of  STM  are  as  before.  Note  that  in  this  scheme  both 
LTMs  share  the  same  STM.  There  are  many  other  possibilities, 
however . 

Stability  of  the  Neural  Net 

Stability  is  dealt  with  in  the  proposed  neural  net  model  in 
two  ways.  First,  in  equation  (2)  we  note  that  the  terms 
(B  -  xA(t)  )  [fi(xi(t) )  +  u1(t)]  add  to  xA(t)  heavily  when  x^t)  s  0 
but  when  x^t)  »  B  the  contribution  of  this  term  is  zero. 

Since  this  is  the  only  term  which  contributes  to  Xj(t),  xi(t)  is 
bounded  above  by  the  saturation  level  B  [Grossberg,  I960] . 

The  other  factor  which  enforces  stability  is  the  use  of 
inhibitory  signals  which  prevent  STM  from  saturating  at  level  B. 
Each  node  v^,  characterized  by  a  state  component  x^(t),  "sees" 
the  outside  environment  from  its  own  point  of  view,  which  prefers 
some  types  of  signals  or  some  weighted  combination  of  signals. 

The  net  as  a  whole  attempts  (a)  at  each  node  v ^  to  excite  acti¬ 
vity  Xj(t)  in  proportion  to  the  content  u^(t)  in  the  input  signal 
vector  u(t)  to  which  v^  is  preferentially  tuned;  (b)  at  each 
node  Vy  j  *  i,  to  inhibit  activity  x^(t)  in  proportion  to  the  con¬ 
tent  Uj,(t)  of  the  input  vector  and  the  recent  history  of 
x1(t),  i.e.  fi(x^(t))  shunted  by  the  activity  Xj(t). 

The  excitation  of  a  node  and  inhibition  at  all  other  nodes 
is  the  "on-center  off-surround"  action  seen  from  the  point  of 


view  of  one  node.  This  simultaneous  activity  going  on  at  each 
node  results  in  a  competitive  struggle  in  which  some  subset 
<xk(t)>,  of  (x1(t)>  will  grow  and  remain  strong  and  the 
remaining  (x^(t))  will  dwindle  and  fall  below  some  threshold. 

This  arrangement  will  prevail  for  as  long  as  some  input  pattern 
persists.  This  results  in  a  vector  x(t)  in  STM  which  represents  a 
pattern  that  is  global  to  the  net  and  which  can  now  be  compared 
with  initial  patterns  (pm)  in  LTM  via  Equation  (3)  or  any 
variants  thereof  [Cohen  and  Grossberg,  1983] . 

Modularity  and  Neural  Nets 

Neural  nets  can  generate  outputs  which  behave  either  as 
excitatory  or  inhibitory  inputs  to  other  neural  nets.  It  is 
therefore  relatively  easy  to  use  a  modular  approach  to  organizing 
a  large-scale  neural  net.  In  fact  the  Grossberg  outstar  con¬ 
figuration  [Grossberg,  1978,  198l]  allows  an  entire  net  to  be 
excited  or  inhibited  by  the  use  of  a  single  control  signal. 

Thus,  the  net  as  a  whole  is  capable  of  implementing  switching 
functions  or  attention  focusing  mechanisms  which  might  inhibit 
some  subnetwork  activity  and  excite  activity  elsewhere  e.g.,  to 
conserve  resources.  This  is  an  important  consideration  for 
large-scale  systems  operating  in  an  environment  where  resources 
are  limited. 

Neural  Model  Simulator  Exoer iraental  Imolementation 


Optical  processing/computing  is  under  investigation  by  the 
University  of  Dayton  Research  Institute  (UDRI)  and  other  research 
organizations  in  recognition  of  Its  perceived  natural  strengths, 
which  are:  (a)  speed,  (b)  parallelism  and  (c)  capability  for 
dense  interconnections  [IEEE  Proceedings,  July  1984].  Optical 
computing  approaches  may  best  be  used  initially  as  a  tool  to 
solve  the  matrix  algebraic  model  equations  which  are  most  com¬ 
putationally  intensive.  Ultimately,  device  realizations  such  as 
optically  interconnected  arrays  of  optical  nonlinear  devices 


(e.g.,  threshold  logic  elements)  have  great  potential  for  imple¬ 
menting  large,  richly  connected  networks  having  neural  charac¬ 
teristics  . 

For  the  neural  simulator  suggested  here,  optical  processing 
techniques  and  hardware  primarily  are  envisioned  in  a  service 
role  to  implement  a  major  portion  of  a  laboratory  test  bed  to  be 
used  to  investigate  adaptive  neural  networks.  Optics  will  be 
used  to  speed  up  the  simulator  by  performing  matrix  algebraic 
operations  much  faster  than  could  digital  equipment  of  comparable 
cost  and  size.  The  ultimate  goal  of  an  all-optical  neural  pro¬ 
cessor  should  not  be  neglected,  although  such  an  implementation 
is  viewed  as  premature  at  this  time. 

The  suggested  simulator  design  is  a  hybrid  (optical/digital) 
system  in  which  the  parts  are  assigned  duties  most  appropriate  to 
their  respective  natural  strengths.  The  concept  of  modeling  or 
implementing  neural  networks  and  associative  memories  using  opti¬ 
cal  processing  systems  recently  has  been  addressed  both  theoreti¬ 
cally  and  experimentally  [Farhat,  1985a  and  1985b,  Fisher,  1984, 
Hecht-Nielsen,  1982] .  The  suggested  design  is  related  in  subject 
but  comprises  an  innovative  approach  based  essentially  on  an 
extension  of  the  work  referenced  rather  than  the  Hopfield  neural 
network  model  [Grossberg,  1976a,  1983;  Hopfield,  1982]. 

The  form  of  the  suggested  system  is  shown  schematically  in 
Figure  3.  Numerous  simple  optical  components  such  as  cylindrical 
and  spherical  lenses  have  been  omitted  from  the  figure  for  the 
sake  of  clarity.  In  all  cases  the  omitted  elements  are  being 
used  in  straightforward  ways  to  perform  imaging  or  collimation, 
with  parameters  well  within  their  performance  capabilities. 
Operation  of  the  optical  part  of  the  system  to  model  Equations 
(2)  and  (3)  of  the  proposed  neural  network  will  now  be  described. 
The  system  uses  a  modification  of  a  well  known  matrix-vector 
multiplication  architecture  [Goodman,  1978] .  Spatial  light  modu¬ 
lators  (SLMs)  are  a  key  real-time  device  requirement  for  optical 


processing,  and  may  be  viewed  as  electronical ly  programmable 
masks  or  transparencies.  The  updating  of  the  M  matrix  stored  in 
SLM2  per  Equation  (2)  is  done  using  the  optical  train  including 
LD1,  SLM2 ,  BS1,  D1 ,  LD2 ,  and  SLM3 .  The  current  M  values  in  SLM2 
are  modulated  onto  a  uniform  beam  and  imaged  through  beamsplitter 
BS1  to  2-D  detector  array  Dl.  The  update  term  of  Equation  (2)  is 
written  by  the  electronic  processor  into  SLM3  and  imaged  with  a 
bounce  off  BS1  onto  Dl  in  proper  element  registration  with  the 
image  from  SLM2 .  Element-by-element  addition  occurs  by  the 
natural  intensity  integration  of  the  Dl  detector  elements  which 
are  read  out,  properly  scaled,  and  formed  as  the  next  array  M  in 
SLM2 .  The  other  mode  of  the  optical  system  implements  Equation 
(3)  using  a  well-known  matrix-vector  multiplication  architecture 
[Goodman,  1978].  The  optical  train  for  this  mode  is:  LD1 ,  SLM1 , 
SLM2,  BS1,  and  D2 .  Vector  x  is  input  to  SLM1  (e.g.,  an  acousto¬ 
optic  cell  illuminated  by  laser  diode  LD1)  and  is  properly  con¬ 
nected  to  the  modulator  elements  in  SLM2  (vertically  imaged  and 
horizontally  broadcast).  After  a  bounce  off  BS1,  the  desired 
output  vector  elements  are  collected  on  linear  detector  array  D2 
by  optics  which  image  horizontally  and  collect  vertically.  The 
detailed  design  of  the  optical  architecture  requires  further 
study,  and  Figure  3  should  be  considered  an  example  only.  It  may 
prove  advantageous,  for  example,  to  perform  the  matrix  update 
electronically,  since  only  addition  is  involved. 

Issues  such  as  the  representation  of  bipolar  values  in  the 
intrinsically  unipolar  incoherent  optical  systems,  (and,  if  binary 
SLMs  are  used,  the  processing  of  analog  values)  must  be  con¬ 
sidered.  Figure  4  is  an  example  of  a  modification  of  the  matrix- 
vector  multiplier  architecture  [Goodman,  1978]  of  Figure  3  which 
handles  bipolar  values  in  both  the  vectors  and  matrix  and  also 
performs  two  parallel  matrix-vector  multiplies  using  both  time 
sequencing  and  spatial  partitioning.  Two  parallel  multiplies 
would  be  used,  for  example,  to  model  the  interaction  of  two  asso¬ 
ciative  memories.  Two  LTM  matrices,  A(l)  and  A(2),  are  stored  in 
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two-dimensional  light  modulator  SLM2 ,  which  is  partitioned  ver¬ 
tically  (parts  A  and  B)  to  hold  the  two  separate  matrices. 
Horizontal  (+  and  -)  partitioning  separates  the  positive  and 
negative  portions  of  the  matrices.  The  linear  SLM ,  SLM1,  is  used 
to  input  x  vectors  and  is  also  partitioned  vertically  (A  and  B 
parts)  to  correspond  to  the  two  LTM  matrices  stored  in  SLM2 .  The 
four-part  sequence  to  perform  one  complete  cycle  of  operation  is 
as  follows:  Input  modulator  SLM1  is  first  loaded  with  the  posi¬ 
tive  values  of  the  first  input  vector  x1  in  its  A  section  and 
zeroes  in  its  B  section.  A  matrix-vector  multiplication  is  per¬ 
formed  optically  with  results  appearing  on  linear  detector  Dl. 
Since  the  B  section  of  SLM1  contains  zeroes,  the  resulting  vector 
is  the  product  of  the  A  sections  of  SLM1  and  SLM2 .  Because  of 
the  polarity  (+/-)  partitioning  of  SLM2 ,  the  output  vector  is 
presented  in  polarity-partitioned  format  on  Dl.  Dl  is  twice  the 
length  of  a  bipolar  vector  and  contains  vector  components 
corresponding  to  positive  and  negative  contributions.  These 
contributions  to  the  A  result  are  read  out  of  Dl  by  the  electro¬ 
nics  and  stored  in  digital  memory.  Next,  the  negative  elements 
of  input  vector  x^  are  loaded  into  the  A  part  of  SLM1  and  the 
matrix-vector  cycle  is  repeated.  A  second  set  of  polarity- 
partitioned  result  vector  components  is  obtained,  this  time  in 
reversed  polarity  order.  These  are  read  out  and  electronically 
added  to  the  first  results  to  obtain  the  complete  bipolar  vector 
result  for  Section  A.  An  identical  two-step  sequence  using  the  B 
parts  of  the  modulators  results  in  the  second  vector  result. 

The  electronic  system  consists  of  processing,  timing,  and 
other  circuitry  to  interface  with  the  electro-optic  components. 
Processing  is  needed  to  implement  Equation  (1)  and  to  perform 
scaling  and  other  housekeeping  operations  required  by  signals 
transmitted  to  and  received  from  the  SLMs  and  detector  arrays, 
respectively.  The  processing  load  required  to  implement  Equation 
(1)  will  vary  widely  depending  on  the  form  selected  for  f(w). 
Initially,  forms  which  result  in  a  scalar  equation  (i.e.,  rela- 
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tively  low  processor  workload)  will  be  investigated,  but  it  is 
anticipated  that  forms  requiring  vector-matrix  operations  also 
will  be  of  interest.  Nevertheless,  the  choice  of  Equation  (1) 
for  digital  rather  than  optical  implementation  is  justified  based 
on  the  requirement  for  flexibility  to  alter  the  form  of  Equation 
(1)  (e.g.,  form  of  f{w))  since  flexibility  is  difficult  to  imple¬ 
ment  optically.  The  optical  part  of  the  system  is  used  for 
matrix-vector  operations  and  benefits  from  the  inherent  speed, 
parallelism,  and  connectivity  of  optical  implementations. 

Some  discussion  of  the  hardware  requirements  implied  by  the 
types  of  systems  proposed  (Figures  3  and  4)  is  in  order,  since 
they  involve  real-time  devices.  By  far  the  most  demanding  device 
requirement  involved  is  that  of  the  SLMs,  since  this  device  tech¬ 
nology  is  not  generally  at  the  off-the-shelf  stage.  SLM1  can  be 
implemented  with  an  acousto-optic  cell  with  very  low  risk.  All 
the  remaining  optical  components  can  be  Implemented  with  off-the- 
shelf  commercially  available  items,  many  of  which  are  already  on 
hand.  In  addition  to  the  acousto-optic  cell,  which  can  serve  as 
a  1-D  SLM,  two  promising  types  of  2-D  SLM  (among  others)  might  be 
used:  the  Litton/Semetex  magneto-optic  LIGHT-MOD™  device  [Ross, 
1982] ,  and  an  electronically  addressed  twisted  nematic  liquid 
crystal  modulator  obtained  by  modifying  a  consumer  pocket  TV  unit 
[Fisher,  1985].  The  LIGHT-MOD  devices  are  48-by-48  element 
arrays  which  have  the  limitation  of  binary  modulation.  Also  com¬ 
mercially  available  are  128-by-128  element  LIGHT-MODs.  The  pro¬ 
bable  maximum  size  array  to  be  modeled  will  be  about  6-by-6, 
which  allows  a  4-by-4-bit  array  on  the  LIGHT-MOD  to  be  assigned 
to  each  unipolar  element  of  the  model,  thus  enabling  techniques 
such  as  area  weighting  to  be  used  to  obtain  analog  modulation. 
Some  level  quantization  noise  is  expected  using  a  4-by-4-element 
scheme.  Due  to  the  anticipated  robust  nature  of  neural  networks, 
we  expect  that  the  resulting  performance  degradation  will  be 
small.  This  in  itself  is  an  interesting  area  of  research,  since 
many  schemes  might  benefit  in  terms  of  practical  hardware 
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requirements  if  quantization  could  be  tolerated.  The  LIGHT-MOD 
devices  can  also  be  operated  in  a  duty  cycle  modulation  mode,  in 
which  analog  signal  values  are  represented  by  the  fraction  of 
time  that  a  modulating  element  is  turned  on  during  a  cycle.  This 
mode  has  been  successfully  used  by  LITTON  Data  Systems,  develo¬ 
pers  of  the  device,  to  demonstrate  gray  scale  operation  at  TV 
frame  rates  using  a  128-by-128  element  device  as  a  display.  Both 
the  area  weighting  and  duty  cycle  modulation  methods  require  a 
price  in  drive  circuit  and/or  software  complexity.  An  assessment 
of  this  tradeoff  should  be  part  of  an  initial  design  study  as 
should  consideration  of  the  use  of  other  type  SLMs  such  as  the 
liquid  crystal  (TV)  unit.  The  liquid  crystal  devices  have  not 
been  fully  characterized;  however,  they  are  known  to  have 
120-by-140-element  resolution,  operate  at  TV  frame  rates,  and 
have  analog  modulation  capability.  We  have  measured  contrast 
ratios  of  10:1  on  these  devices.  Obviously,  2-D  SLMs  can  be 
under-utilized  as  a  linear  array  for  SLM1  if  expedient.  other 
state-of-the-art  devices,  such  as  the  Texas  Instruments 
Deformable  Mirror  Device,  may  also  be  available. 

The  exact  form  and  detailed  design  of  the  electronic  por¬ 
tions  of  the  hybrid  system  requires  further  definition,  but 
Figure  5  presents  a  block  diagram  which  indicates  the  main  com¬ 
ponents.  Signal  flow  is  circular  through  the  optical  and 
electronic  processors,  with  each  sub-system  performing  tasks 
which  it  does  best,  as  mentioned  above.  Specialized  interface 
circuits  involving  functions  such  as  A/D  and  D/A  conversion, 
amplification,  and  sample-and-hold  are  required  in  the  signal 
flow  paths  between  the  optical  and  electronic  domains.  In  addi¬ 
tion,  dedicated  circuits  providing  timing,  control,  and  data  buf¬ 
fering  (as  indicated)  will  probably  be  required.  The  electronic 
processor  could  be  a  16-  or  32-bit  bus-oriented  microprocessor 
system,  using  for  example,  the  68000  chip.  This  processor  will 
have  both  real-time  functions  during  simulation  runs  and  other 
general-purpose  duties  including  software  development,  data  ana- 
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lysis,  and  system  executive  functions.  The  off-loading  of  real¬ 
time  data  buffering  and  timing  control  functions,  using  custom 
circuits  (as  indicated)  is  expected  to  be  mandatory  to  enable  the 
electronic  processor  to  handle  the  computational  load  implied  by 
simulation  at  attractive  real-time  rates.  Mass  storage  is 
necessary  both  for  program  development  and  storage  and  for 
storage  of  simulation  data  and  case  descriptions. 
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AND  OPTICAL  PROCESSING  SYSTEMS 
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Optical  processing  systems--particularly  analog  processors-- 
often  depend  on  coherent  light  to  perform  mathematical  opera¬ 
tions.  The  assumption  of  full  spatial  coherence  leads  to  a 
description  of  optical  propagation  which  is  linear  in  the  field 
amplitude  and  which  relates  conjugate  planes  through  Fourier 
transforms.  The  assumption  of  temporal  coherence 
(monochromaticity)  eliminates  wavelength  scaling  effects  and 
greatly  simplifies  the  system  description.  The  "classical" 
coherent  optical  processor  then  uses  laser  illumination,  a  two- 
dimensional  input  in  the  form  of  a  transparency,  various  lenses 
to  create  the  Fourier  transform  of  the  input,  means  of  manipu¬ 
lating  the  Fourier  transform  (usually  a  hologram) ,  additional 
lenses  to  reform  the  filtered  image,  and  a  two-dimensional  output 
sensor  (usually  a  television  camera) .  The  criticisms  of  such  a 
system  are  by  now  well  known,  including  the  system's  inflexibil¬ 
ity  and  limited  accuracy.  The  standard  solutions  to  these 
problems  are  equally  well  known,  including  the  use  of  two- 
dimensional  spatial  light  modulators  for  the  input  and  Fourier 
plane  filters.  These  solutions  compound  the  accuracy  problem  by 
limiting  the  resolution  of  the  system  to  several  dozen  wave¬ 
lengths.  The  price  paid  in  component  cost  and  power  consumption 
is  also  substantial. 

In  an  attempt  to  circumvent  the  accuracy  and  flexibility 
problems  of  analog  processors,  the  optical  processing  community 
has  turned  to  digital  processors,  as  did  the  electronic  pro¬ 
cessing  community  years  ago.  In  a  digital  optical  processor  the 
input  and  output  are  abstract  numbers,  the  magnitudes  of  which 
are  represented  by  light  intensity  levels.  Even  if  the  input 
representation  is  binary,  all  optical  computers  proposed  so  far 
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have  a  multilevel  output,  referred  to  as  mixed  radix  represen¬ 
tation.  Practical  implementations  of  optical  computers  require 
modulating  devices  which  are  inefficient.  To  overcome  this 
problem  recent  attention  has  centered  on  bistable  optical  devices 
which  exhibit  thresholding  and  gain.  A  more  subtle  drawback  to 
optical  computing  is  that  all  systems  proposed  so  far  are  based 
on  a  geometrical  optics  design  approach.  Since  geometrical 
optics  is  only  an  approximation  of  how  light  propagates, these 
optical  computers  encounter  unanticipated  difficulties  when  put 
into  practice,  particularly  from  diffraction  effects. 

A  fruitful  approach  to  using  optics  for  computing  and 
signal  processing  is  to  gain  a  deeper  understanding  of  how  an 
optical  system  works  and  to  use  this  understanding  to  integrate 
optics  into  electronic  processors.  Optimum  optical  processing 
system  designs  or  design  tradeoffs  might  be  identified  through  a 
thorough  study  of  optical  propagation  from  a  coherence  theory 
viewpoint  with  an  emphasis  on  optical  processing  applications. 

The  cross-spectral  density  function1  has  recently  been  used  as  a 
tool  to  analyze  imaging  and  processing  systems.2  The  cross- 
spectral  density  function  has  the  advantage  of  being  perfectly 
general  in  both  spatial  and  temporal  coherence.  It  is  also 
equivalent  to  other  recent  analyses  of  generalized  imaging 
systems.3'4  The  philosophy  behind  designing  optical  processors 
from  a  coherence  viewpoint  assumes  that  coherence  is  introduced 
into  an  optical  system  through  a  generalized  grating,  the  stan¬ 
dard  Ronchi  ruling  being  the  simplest  type.  With  proper  shaping, 
the  coherence  function  may  be  made  to  sample  an  input  or  carry 
information.  A  second  grating  combines  mutually  coherent  points 
to  establish  the  processing  operation.  We  present  here  two 
possibilities  of  how  a  thorough  knowledge  of  coherence  theory 
would  enhance  the  design  of  optical  processors  and  computers. 

The  first  example  is  an  analog  processor,  and  the  second  example 
is  a  digital  processor'. 


The  most  common  input  for  an  analog  optical  processor  is  a 
two-dimensional  image.  In  a  coherent  processor  the  image  must  be 
converted  to  a  transparency  and  illuminated  by  a  laser.  In  an 
incoherent  system  the  image  is  manipulated  directly  (and  passively) 
by  appropriately  placed  gratings  and  apertures.  No  spatial 
light  modulators  or  extra  illumination  sources  are  needed, 
resulting  in  a  great  savings  in  cost,  complexity,  and  power  con¬ 
sumption.  Systems  that  perform  edge  detection  have  already  been 
described  in  the  literature.4*5  These  systems  use  simple  Ronchi 
rulings  to  introduce  coherence  in  one  dimension.  By  using 
generalized  gratings--essentially  arrays  of  pupils — more  compli¬ 
cated  operations  are  possible.  Because  of  the  broad-band  nature 
of  the  input  the  output  can  be  separated  by  color  filters  to  give 
differently  scaled  results  in  different  wavelengths.  The 
filtering  operations  can  be  performed  at  any  wavelength  for  which 
gratings  and  detectors  exist.  In  an  end  application  we  might 
have  an  infrared  imaging  system  with  a  sophisticated  lens  on  the 
front  end.  The  lens  would  perform  a  specific  operation  which 
would  pre-filter  the  image  before  it  reached  the  detector,  thus 
simplifying  the  electronic  processor.  Because  of  the  broad-band 
illumination  in  such  a  system  we  need  the  cross-spectral  density 
function  analysis  to  be  able  to  predict  the  system  performance. 

In  addition,  the  cross-spectral  density  function  leads  to  an  ana¬ 
lysis  which,  though  complicated,  can  lend  itself  to  simple 
interpretation. 

In  current  thinking,  a  digital  processor  or  optical  com¬ 
puter  would  contain  several  bistable  devices  to  perform 
thresholding.  To  take  advantage  of  the  two-dimensional  nature  of 
optical  propagation  there  would  likely  be  two-dimensional  arrays 
of  these  devices.  There  might  also  be  two-dimensional  arrays  of 
laser  diodes  used  as  light  sources.  Although  each  diode  would  in 
itself  be  spatially  coherent,  each  diode  would  be  incoherent  with 
respect  to  its  neighbors.  Because  they  are  intensity  dependent 
devices,  each  bistable  device  in  an  array  would  be  incoherent 


with  respect  to  the  other  devices.  Thus,  a  digital  optical  pro¬ 
cessor  would  essentially  be  a  spatially  (but  not  temporally) 
incoherent  system.  As  light  propagates  through  the  system, 
information  from  each  cell  of  the  bistable  array  is  mixed  with 
the  other  cells.  This  mixing,  if  it  can  be  understood  and 
controlled,  can  be  used  to  form  interconnections  between  cells. 

A  coherence  analysis  would  show  how  best  to  take  advantage  of  the 
effects  of  non-coherent  propagation.  By  introducing  generalized 
gratings  and  pupils  (possibly  programmable)  the  operation  of  the 
optical  computer  could  be  enhanced. 
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STABILIZATION  AND  CONTROL  TECHNIQUE 
WITH  IMPROVED  PRECISION 

Gordon  R.  Little 

University  of  Dayton  Research  Institute 
Dayton,  Ohio  45469 

Stabilization  and  control  of  the  phases  of  mutually  coherent 
optical  sources  are  important  in  many  areas  of  modern  optics, 
including  holography , 1-4  and  coherent  optical  computing. 5  Over 
the  years  several  active  schemes  for  achieving  phase  stabiliza¬ 
tion  and  control  have  been  reported. 1-5  The  method  of  MacQuigg3 
is  relatively  easy  to  implement  but  yields  limited  phase  control 
while  that  of  Rakhimov  and  Tronko4  requires  the  use  of  magneto¬ 
optic  gratings  and  imposes  overly  strong  restrictions  on  the 
source  geometry.  We  report  on  a  simple  modification  of 
MacQuigg's  method  which  yields  good  phase  control  in  an  easily 
implemented  system. 

The  operating  principles  of  MacQuigg's  method  are 
illustrated  in  Figure  1.  Here,  it  is  desired  to  stabilize  and 

control  source  S2  relative  to  source  Si .  A  simple  phase  control 
grating  is  fabricated  by  overlapping  the  two  beams  and  making  a 
hologram.  When  the  hologram  is  placed  in  its  original  position 
and  rotated  through  a  small  angle  about  an  axis  parallel  to  the 
grating  lines,  straight  equally  spaced  fringes  will  appear  in  the 
two  beams  leaving  the  grating.  These  fringes  are  due  to  the 
interference  of  the  Si  and  S2  beams  with  the  tilted 
reconstructed  waves.  A  detector  is  placed  in  one  of  the  beams 
and  its  output  is  measured  with  a  lock-in  amplifier.  The 
oscillator  output  from  the  lock-in  amplifier  is  used  to  drive  a 
phase  shifter  on  S2  providing  a  small  dither  in  the  fringe  posi¬ 
tions.  Since  the  lock-in  amplifier  output  is  zero  when  the 
fringe  pattern  is  at  either  a  maximum  or  a  minimum  on  the  detec¬ 
tor,  this  output  can  be  used  as  an  error  signal  in  a  closed  loop 
feedback  system,  thereby  stabilizing  the  phase  of  S2  relative  to 

Si. 
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Phase  control  in  this  scheme  was  achieved  by  translating  the 
grating  in  the  direction  perpendicular  to  the  fringe  orientation. 
However,  since  the  grating  spacing  is  of  the  order  of  the  source 
wavelength,  accurate  phase  control  requires  extremely  high  reso¬ 
lution  in  the  translation  device.  For  example,  with  the  0.488  nm 
Argon  ion  laser  line  and  a  source  angular  separation  of  10°,  a 
grating  translation  of  0.08  |im  corresponds  to  a  10°  phase  shift. 
Clearly,  phase  control  of  1°  would  be  difficult  using  this 
approach. 

To  achieve  greater  accuracy  in  phase  control  we  suggest 
translating  the  feedback  loop  detector  rather  than  the  grating. 
Since  the  fringe  spacing  can  easily  be  adjusted  to  be  relatively 
large  by  controlling  the  grating  tilt,  and  since  translation  of 
the  detector  through  one  fringe  corresponds  to  360°  phase  change, 
the  resolution  in  phase  control  can  be  much  greater,  independent 
of  the  actual  grating  spacing.  Calibration  of  the  phase  control 
is  achieved  by  monitoring  the  fringe  in  the  second  fringe  pattern 
using  an  auxiliary  detector.  A  typical  calibration  is  shown  in 
Figure  2  where  the  output  I  of  the  monitoring  detector  (A)  is 
plotted  as  a  function  of  the  position  of  the  feedback  loop  detec¬ 
tor.  Also  shown  (solid  curve)  in  the  figure  is  the  best  fit 
curve  of  the  form 

I  =  <*4  +  <J3  Sin  2ttcx2(x  -  ai) 

In  this  case,  a2  was  found  to  b*»  .2718  ±  0.0001  cycle/mm  yielding 
a  phase  control  calibration  factor  of  97.85  +0.04°  mm.  A  simple 
micrometer-driven  translation  stage  with  0.01  mm  resolution  could 
easily  yield  1°  phase  setability.  It  should  be  noted  however, 
that  it  is  the  mean  phase  which  can  be  controlled  to  this  level. 
Short  term  phase  fluctuations,  due  to  the  dither  for  example, 
will  be  considerably  larger.  In  many  applications,  such  as  in 
fabricating  holograms  for  use  in  coherent  optical  processors,  the 
short-term  fluctuations  will  act  only  to  slightly  reduce  the 
hologram  contrast. 
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A  simple  variant  of  the  detection  scheme  yields  a  more  effi¬ 
cient  use  of  the  available  optical  power.  Here  a  mask  featuring 
slit  apertures  that  match  the  fringes  is  placed  in  the  pattern 
and  a  lens  is  used  to  focus  all  of  the  passed  radiation  onto  the 
detector.  The  mask,  rather  than  the  detector,  is  translated  to 
achieve  phase  control. 

To  stabilize  and  control  multiple  sources,  the  phase  control 
hologram  can  be  fabricated  using  multiple,  non-overlapping  expo¬ 
sure  of  pairs  of  the  sources.  For  N  sources  (N  -  1  to  be 
controlled  relative  to  the  first),  there  will  be  N  fringe  pat¬ 
terns.  The  one  associated  with  the  reference  source  will  contain 
N  -  1  fringe  pattern  segments  while  the  others  will  each  have  one 
segment.  The  N  -  1  single  segment  patterns  can  be  used  in  feed¬ 
back  loops  for  stabilization  and  control  while  the  composite  can 
be  used  for  calibration.  Photographs  of  the  fringe  patterns  for 
a  four-source  system  are  shown  in  Figure  3.  The  different  spa- 
cings  of  the  fringes  are  due  to  the  differing  angular  separations 
of  the  sources. 
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Phase  Stabilization  and  Control  Scheme  of  MacQuigg. 
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INTRODUCTION 


The  research  efforts  reported  here  cover  work  done  in  a  six 
■on t h  ini t ial -phase  project  consisting  of  design  and  consulting 
tasks.  Limited  work  of  a  theoretical  nature  was  carried  out,  and 
the  two  presentations  were  given  (one  presentation  at  the  SPIB 
Annual  Meeting  in  San  Diego;  one  at  the  OSA  Meeting  in  Washington, 
DC).  No  experimental  or  fabrication  effort  was  planned  or  carried 
out . 

In  the  two  short  segments  following,  work  in  Distributed 
Threshold  Computing  Implementations  and  in  preliminary  design  of  a 
Pipelined  Polynomial  Processor  are  described.  Then,  the  principal 
thrust  of  our  work,  the  All-Optic  Analog/Digital  Converter,  is 
described  in  some  detail.  An  integrated-optic  layout  for  this 
device  is  presented,  and  the  needed  analysis  is  discussed.  A  copy 
of  a  joint  publication  (with  UDRI  staff)  is  attached  as  an 
appendix . 


DISTRIBUTED  THRESHOLD  COMPUTING 

This  activity  was  performed  in  cooperation  with  researchers 
from  UDRI;  it  consisted  primarily  in  looking  for  ways  to  implement 
the  architectures  of  interest  to  UDRI  in  integrated-optic  format. 
As  this  was  a  support  activity,  and  no  device  parameters  were 
identified,  little  detailed  design  was  carried  out.  There  was 
some  design  work  done  to  try  to  devise  a  way  to  implement  a  2x2 
multiplier  by  direct  implementation  of  the  multiplication  table  in 
threshold  logic.  We  were  successful  in  finding  two  possible 
implementations,  both  using  waveguide  horn  structures.  In  the 
first,  the  entire  logic  takes  place  in  the  horns;  in  the  second, 
the  horns  are  meant  only  to  convey  the  light  to  a  multimode  region 
where  collection  gratings  assemble  the  output  light  into  the 
proper  directions.  This  work  was  dropped  because  insufficient 
time  for  the  UDRI  researchers  to  fully  develop  their  concepts  and 
because  of  attention  to  the  A/D  converter  described  below. 


PIPELINED  POLYNOMIAL  PROCESSOR 

A  modest  effort  in  preliminary  grating  design  was  made  in 
support  of  this  processor  concept.  The  device  is  based  upon  the 
paper  by  Verber  et  al1 .  The  device,  as  envisioned  for  integrated- 
optical  implementation,  utilizes  two  kinds  of  optical  gratings: 
(1)  an  "adder"  grating,  used  as  a  beam  combiner  to  combine  the 
pipeline  mainstream  data  with  new  coefficient  data;  and  (2)  a 
multiplier,  to  be  constructed  using  an  electrooptic  grating.  The 
relevant  features  of  these  gratings  are  that  type  (1)  is  a 
holographic  surface  grating  with  a  large  deflection  and  a  very 
narrow  angular  range  of  operation,  while  type  (2)  is  formed 
;  photo  1 i thographical ly  and  therefore  has  a  large  period  and  a  wide 

f  angular  range  of  acceptance,  but  a  small  deflection.  For  this 


report,  further  details  on  the  device  layout  and  operation  are  not 
relevant . 

The  work  done  on  this  project  consisted  of  looking  at 
possible  geometries  for  adder  gratings  (type  1)  and  the  impact  on 
the  area  of  crystal  required  for  the  various  geometries.  Two 
geometries  were  considered:  (1)  a  "crossed-beai"  geometry,  where 
the  grating  occupies  only  the  region  of  intersection  of  the  beams 
being  combined;  and  (2)  a  Kogelnik2  type  of  grating,  where  the 
grating  extends  outside  of  the  region  of  intersection.  It  was 
concluded  that  no  strong  "real  estate”  advantages  would  accrue  to 
either  design,  but  that  the  Kogelnik  type  of  grating  might  be 
easier  to  fabricate  because  of  looser  alignment  tolerances  as  far 
as  the  beam  paths  were  concerned. 
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ALL-OPTICAL  A/D  CONVERTER 
INTRODUCTION 


By  an  all-optical  analog-to-digital  converter,  is  meant  an  A/D 
which  accepts  an  optical  input  whose  intensity  is  an  analog 
representation  of  a  numerical  value,  and  whose  output  is  an 
optical  binary  representation  of  that  analog  value.  We  have 
chosen,  for  reasons  of  compactness,  stability,  and  power 
consumption,  to  consider  an  integrated  optical  design.  It  is 
intended  that  the  integrated  optical  A/D  be  a  general  purpose 
device  which  could  be  incorportated  into  a  variety  of  systems, 
although  the  primary  motivation  for  this  work  was  the  need  for 
A/D  conversion  at  the  output  of  an  integrated  optical  DMAC 
(optical  digital  multiplication  by  analog  convolution)  device. 
This  work  led  not  only  to  a  novel  architecture  of  the  optical  A/D 
but  also  to  concepts  for  waveguide  devices  which,  through  the 
proper  use  of  nonlinear  optical  materials,  could  perform  the 
required  nonlinear  operations. 

BASIC  DESIGN 

The  design  of  the  A/D,  as  shown  in  Figure  1,  is  an  adaptation 
of  the  electronic  flash  converter.  It  consists  of: 
e  An  analog  input  channel, 

e  Taps  for  removing  equal  amounts  of  energy  from  the 
input  channel, 

e  Optical  thresholding  devices  or  comparitors  which 
pass  a  fixed  signal  when  the  input  exceeds  the 
threshold  and  no  energy  when  the  signal  is  below  the 
threshold, 

e  A  set  of  XOR  gates, 

e  A  distribution  network,  and 

e  A  set  of  binary  output  channels. 
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Figure  1.  Optical  version  of  the  flash  A/D  converter. 

There  are  a  number  of  factors  which  must  be  considered 
more-or-less  simultaneously  when  considering  design  alternatives 
for  an  1.0.  implementation  of  the  A/D: 

•  Maintain  reasonable  overall  size. 

•  Provide  suitably  sharp  thresholding  action. 

•  Maintain  outputs  from  the  threshold  elements 
consistent  with  XOR  requirements. 

•  Aim  for  uniform  loss  in  distribution  network. 


THB  COMPONENT  REQUIREMENTS 

The  threshold  device  and  the  XOR  are  two  "active"  devices 
which  must  be  designed  so  that  they  interact  properly  and  so  they 
can  be  compatibly  fabricated  on  the  same  substrate.  The  principal 
design  decision  initially  was  thought  to  be  the  form  of  the 
threshold  device,  since  this  will  impact  the  design  of  the  taps 
and  even  the  analog  input  channel.  However,  it  was  seen  that  the 
design  of  the  XOR  has  an  equal  if  not  greater  impact  on  the 
overall  design. 

The  best  known  integrated  optical  method  for  implememting  the 
XOR  is  based  upon  the  operation  of  the  single-mode  Y  junction3, 
although  an  equivalent  operation  can  be  performed  with  collimated 
beams  in  a  planar  waveguide  using  a  surface-grating  beam 
splitter.  In  both  cases  the  XOR  operation  depends  upon  the  phase 
difference  between  the  two  incident  beams.  This  is  a  very 
inhibiting  requirement  since,  in  the  case  of  the  A/D,  it  means 
that  there  has  to  be  a  knowledge  of  the  phase  of  the  light 
leaving  each  of  the  threshold  elements. 
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entirely  different  design  for  an  XOR  gate  which 
the  work  of  Seaton,  Stegeman  and  Winful4  on 
nt  guided  wave  phenomena.  They  discuses  an 

near  devices  in  which  the  nonlinear  properties 
de  make  the  modal  characteristics  intensity 
rmore,  they  present  data  on  liquid-crystal  MB8A 
-V  materials  and  ZnS,  suggesting  that  there  are  a 
als  of  which  experimental  devices  might  be 
Ignoring,  for  the  present,  the  details 
questions,  let  us  consider  Figure  2  which 
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Figure  2.  Suggested  design  of  an  XOR  gate  using  a  waveguide 
overlay  with  a  negative  nonlinearity.  As  shown  in  the  insert,  a 
single  input  of  intensity  IA  passes  through  the  gate  but  two 
simultaueous  inputs  with  total  intensity  IA  +  IB  will  be  ejected 
into  the  substrate. 


suggests  one  way  of  using  this  nonlinear  waveguide  approach  to 
design  an  XOR.  The  input  waveguide  is  designed  to  be  a 

single-mode  guied  which  operates  just  above  cutoff  for  low  light 
intensities.  The  overlay  material  has  a  negative  nonlinearity, 
that  is  the  index  decreases  with  increasing  intensity.  As  the 

intensity  increases  the  guide  index  drops  and  it  cuts  off  the  mode 
and  ceases  to  transmit.  As  shown  in  Figure  3,  it  is  also 
possible  to  accomplish  the  same  function  using  a  material  with  a 
positive  nonlinearity. 

The  first  approaches  considered  for  the  threshold  device  were 
variations  of  the  nonlinear  Fabry-Perot  interf errometer5 . 

However,  we  found  that  it  is,  in  principle,  possible  to  perform 
this  function  using  a  variant  of  the  nonlinear  waveguide.  The 
basic  threshold  device  is  shown  in  Figure  4.  It  is  designed  so 
that  the  channel  is  cut  off  at  low  light  intensities.  Since  the 
nonlinearity  is  positive,  at  high  intensities  the  channel  can  be 
driven  above  cutoff.  As  can  be  seen,  the  output  of  such  a 
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Figure  3.  Suggested  design  of  an  XOR  gate  using  a  waveguide 
overlay  with  a  positive  nonlinearity.  At  a  sufficiently  high 
input  intensity  the  guided  wave  will  be  lifed  out  of  the 
waveguide  resulting  in  a  characteristic  siailar  to  that  shown  in 
Figure  2. 
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Single-mode  guide  in  tf  region 
Cut  off  in  tr  region  for  low  intensities 
Guides  in  V  region  for  high  intensities 


Figure  4.  Suggested  design  of  a  waveguide  threshold  device  using 
a  waveguide  overlay  with  a  positive  nonlinearity.  As  shown  in 
the  insert,  the  output  is  a  linear  function  of  the  output  when 
the  output  is  above  threshold. 
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device  should  go  from  zero,  through  a  transition  region  and  then 
becoae  a  linear  function  of  the  input.  It  is,  of  course, 
desirable  that  the  output  of  the  threshold  device  be  independent 

of  the  input  signal  strength  once  the  input  has  exceeded  the 
threshold  value. 

It  will  be  shown  that  the  proper  threshold  characteristic  as 
well  as  other  advantages  result  froa  the  device  geometry  shown 
is  Figure  5  .  By  separating  the  output  of  the  device  froa  the 
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Figure  5.  C rossed-channe 1  threshold  device  geometry.  The  passive 
output  channel  in  below  cutoff  until  the  signal  in  the  control 
channel  is  high  enough  to  "burrow  through”  the  thinned  crossing 
region  which  is  made  of  a  material  with  a  positive  nonlinearity. 
Increasing  the  index  in  this  region  also  raises  the  controlled 
channel  above  cutoff  and  results  in  the  characteristic  shown  in 
the  insert. 

control  signal  (which  in  our  case  is  the  analog  input),  we  not 
only  achieve  the  desired  gate  characteristic,  but  have  allowed  a 
simple  solution  to  the  problem  of  energy  distribution  at  the  front 
end  of  the  device.  As  shown  in  Figure  6,  the  input  distribution 
is  accomplished  by  using  an  input  channel  with  a  finite 

attenuation  which  crosses  a  set  of  identical  threshold  units. 
By  the  proper  logarithmic  spacing  of  these  units,  we  can  design 
them  so  that  they  will  each  turn  on  at  the  appropriate  value  of 
the  input  intensity. 
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Figure  6.  The  input  signal  Is,  enters  in  a  channel  with  a  fixed 
attenuation  a.  This  channel  intersects  a  set  of  identical 
threshold  devices  of  the  type  shown  in  Figure  5.  Since  the  spacing 
is  logarithmic  in  the  attenuation  of  the  input  channel,  each 

threshold  unit  will  turn  on  at  the  indicated  value  even  though 
the  units  are  identical. 

A  suggested  layout  for  the  entire  A/D  is  shown  in  Figure  7. 
The  outputs  of  the  XOR  gates  enter  a  planar  waveguide  region, 

and  the  distrubution  of  the  light  into  the  binary  output  channels 
is  accoaplished  by  the  use  of  holographic  optical  elements  whose 
design  will  not  be  discussed  here. 

The  major  development  required  for  the  successful 
implementation  of  the  all-optical  A/D  is  that  of  the  nonlinear 
elements.  This  must  proceed  by  a  careful  theoretical  description 
of  the  elements  which  will  lead  to  an  ability  to  predict  device 
performance  a  a  function  of  the  optical  properties  of  the 
materials  considered  for  the  device  fabrication.  This  question  is 
dealt  with  in  more  detail  in  the  following  section. 

ANALYSIS  OF  NONLINBAR  COMPONENTS 

The  nonlinear  waveguide  components  envisioned  for  use  in  the 
Integrated  Optic  A/D  Converter  (IOAD)  device  are  a  totally  new 
concept,  neither  analyzed  nor  constructed  in  previous  work.  There 
are  fundamental  questions  regarding  the  propagation  of  light  in 


Figure  /.  Schematic  of  the  entire  optical  A/D.  Grating  beam 
splitters  are  suggested  to  devide  the  outputs  of  the  threshold 
elements  and  to  combine  the  inputs  to  the  XOR  gates  (which  are 
shown  as  N  )  .  This  will  work  properly  if  adjacent  channels  are 
mutually  incoherent.  As  show  in  the  figure,  each  channel  is  fed 
by  its  own  laser  diode. 

nonlinear  structures,  questions  that  will  require  extensive 
analysis  to  resolve.  In  this  section,  we  discuss  some  of  the 
theoretical  questions  to  be  answered. 

Wave  Propagation  in  Nonlinear  Media 


A  number  of  authors  have  analyzed  wave  propagation  in 
nonlinear  media;  much  of  this  work  is  reviewed  by  Akhmanov  et  al° . 
Among  the  important  results  is  the  instability  of  homogeneous 
plane  waves  in  nonlinear  media.  When  the  intensity  of  the  waves 
becomes  large  enough  that  nonlinear  effects  are  important,  a  plane 
wave  may  become  unstable  and  break  up  into  an  inhomogeneous, 
filamentary  structure. 

In  linear  media,  it  is  often  permissable  to  treat  a  finite 
optical  beam  having  a  width  of  many  wavelengths  as  if  it  were, 
effectively,  a  plane  wave.  In  Kerr  media,  i.e.,  media  having  a 
dielectric  constant  depending  on  the  intensity  of  light  as 
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this  assuaption  is  no  longer  correct.  The  spatial  transients 
occurring  at  the  edge  of  a  finite  beaa  are  of  supreme  importance  in 
determining  how  the  beaa  will  propagate.  In  ref. 6,  it  is  shown 
that  such  media  may  develop  waveguides  by  "self  action",  that  is, 
the  finite  light  beaa  interacts  with  the  medium  to  produce  a 
stable  index  variation  that  traps  the  beaa. 

When  an  interface  with  another  medium,  even  a  linear  one,  is 
introduced,  the  dynamics  of  beaa  propagation  become  more  complex. 
A  number  of  Soviet  authors  have  published  on  this  subject,  one  of 
the  earlier  ones  being  Bioko  et  al7  who  derive  expressions  for 
plane  waves  incident  upon  a  Kerr  medium  at  an  interface  with  a 
linear  medium.  For  partial  reflection,  they  allow  a  plane  wave  to 
be  transmitted  into  the  nonlinear  medium  [but,  they  remark  that 
such  a  wave  is  unstable] .  At  total  reflection,  they  use  an 
inhomogeneous  plane  wave  and  derive  the  form  of  this  wave  for  a 
semi-infinite  nonlinear  medium.  A  series  of  papers  by 
Rozanov* ■ 9 * 1 0 • 1 1  discusses  various  aspects  of  the  nature  of  the 
interactions  with  nonlinear  interfaces.  Kaplan12*13  analyzed 
hysteresis  effects  in  the  reflectivity,  using  a  plane-wave 
analysis  (criticized  by  Rozanov10).  These  works  generated  a 
controversy  concerning  the  behavior  of  light  in  a  nonlinear 
medium;  the  effects  of  finite  beams  in  contrast  to  infinite  plane 
waves;  and  the  effect  of  a  single  interface  on  the  interactions. 

Part  of  the  controversy  was  resolved  in  a  series  of  works  by 
Smith  et  al1 4 ■ 1 5  *  1  *  *  1 7  in  which  they  carefully  unravel  an  early 
claim  of  optical  bistability18.  It  was  initially  found  that 
hysteresis  in  the  reflectivity  was  indeed  observed  at  a  nonlinear 
interface  when  pulsed  light  was  used;  this  was  initially 
interpreted  to  be  an  exhibition  of  bistability.  However,  the 
details  of  the  measurements  did  not  agree  well  with  Kaplan’s 
predictions12,13.  Analysis  of  the  situation  and  experiments  with 
slower  pulses  revealed  that  bistability  was  unproven.  Experiments 
with  a  medium  having  sufficiently  large  nonlinearity  to  be  used  cw 
then  showed  that  there  is  also  no  real  hysteresis,  the  observed 
apparent  hysteresis  being  attributed  to  the  short  pulses  used. 


Propagation  in  Nonlinear  Waveguides 

In  waveguide  devices  having  nonlinear  bounding  media,  the 
same  kinds  of  situations  must  be  dealt  with  as  those  discussed 
above.  Initial  efforts  at  analysis  of  nonlinear  guided  waves  have 
been  made  by  Seaton  et  al18*20.  In  these  papers,  the  waveguide 
itself  is  taken  as  linear  with  bounding  media  of  nonlinear  (Kerr) 
media.  The  nonlinear  media  are  semi-infinite  in  extent.  These 
papers  provide  the  starting  point  for  analysis  of  the  devices 
discussed  here. 


The  devices  envisioned  here  have  nonlinear  media  of  finite 
thickness;  no  solution  in  simple  functions  can  be  obtained  for 
this  case.  Instead,  elliptic  functions  and  integrals  are  obtained. 
Furthermore,  our  devices  are  channelized,  i.e.,  they  are  bounded 
laterally  as  well.  The  crossing  configuration  means  that  spatial 
transients  will  be  significant.  One  example:  a  wave  propagating 
in  the  linear  waveguide,  upon  encountering  the  nonlinear  waveguide 
crossing  it,  will  be  partially  reflected  and,  aa  well,  some  of  the 
light  will  be  lost  into  the  substrate.  This  means  that  there  will 
be  set  up,  at  least  at  the  edge  of  the  nonlinear  medium,  a 
standing-wave  pattern  of  light.  This  may  cause  two  phenomena. 
First,  the  standing  wave  pattern  of  intense  light  comprises  a 
grating  that  converts  the  incoming  light  into  reflected  light  or 
into  light  ejected  into  the  substrate.  So,  it  is  possible  that 
increasing  the  light  intensity  may  merely  lead  to  increased  lost 
light  rather  than  a  transition  into  a  transmitting  state,  as 
desired  for  the  threshold  unit.  Second,  the  nonuniform  intensity 
of  light  will  lead  to  a  nonuniform  interaction  with  the  material; 
such  interactions  are  more  difficult  to  analyze. 

The  preceding  remarks  indicate  that  analysia  of  the  devices 
may  involve  new  theoretical  developments.  This  may  be  the  case; 
however,  there  has  been  much  work,  especially  in  the  Soviet  Union 
(and  referenced  above)  which  lays  the  groundwork.  Furthermore, 
the  analyses  made  so  far  involve  Kerr  media,  without  regard  to 
saturation.  Real  nonlinear  medial  saturate  with  an  index  change 
that  is  only  1-3X  of  the  low-intensity  index.  Hence,  the  more 
severe  effects  may  not  occur.  There  is  good  reason  to  expect  that 
reasonable  devices  can  be  designed,  analyzed  and  made;  and  that 
they  will  perform  as  expected.  It  will,  of  course,  be  important, 
in  a  world  of  technology  accustomed  to  thinking  in  linear  terms 
(additivity,  superposition,  decomposition,  plane-wave  analyses), 
to  be  especially  careful  to  avoid  both  analytical  and  experimental 
paths  that  do  not  lead  where  linear  thinking  would  presume. 
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Abstract 

Optically  iaplaaantad  thrashold  logic  syataas  that  ara  charactarlzad  by  thrasholdlng 
oparations  concantratad  at  ona  functional  location  ara  considarad.  Tha  objective  is  to 
ldantlfy  archltacturas  and  assoclatad  lntagratad  optical  and  holographic  tachnlguas  that 
alght  ba  usad  to  daalgn  suparlor  ragistar-laval  coaputatlon  modules.  A  coaplata  daslgn  for 
a  luapad  thrashold  2-blt  aultlpllar  Is  prasantad  as  an  axaapla,  and  aathoda  for  ganaral 
luapad  thrashold  aodula  synthesis  ara  discussed. 

introduction 

Threshold  logic  has  received  attention  recently  because  It  say  provide  significant  per- 
foraance  advantages  for  a  broad  range  of  aatheaatical  operations  and  because  It  aay  have 
efficient  optical  lapleaentatlons. 1 « 3  This  paper  conalders  luapad  threshold  logic  syataas, 
which  are  defined  as  syateaa  In  which  thresholding  operations  are  concantratad  at  one  func¬ 
tional  location.  The  objective  la  to  identify  architectures  and  assoclatad  holographic  and 
lntagratad  optical  techniques  that  alght  be  used  to  design  register- level  coaputatlon  mod- 
ulae,  such  as  pure-radlx  aultlpllere,  aultlply  accuaulators,  etc.,  with  eoaaandlng  and 
enduring  advantages  in  speed,  power  eonsuaptlon,  size,  fault-tolerance,  ate.,  over  currant 
and  projected  all-electronic  alternatives. 

Plgure  1  shows  two  general  types  of  syateaa  that  relate  sets  of  inputs  and  outputs  using 
weighting  and  thrasholdlng  operations.  Here  "weighting"  refers  to  Interconnects  with 
selected  connection  strengths,  and  "thresholding”  refers  to  decisions  based  on  inequality 
criteria.  Distributed  threshold  logic  syataas  generally  have  nuaerous  distinct  alaaants  of 
tha  saae  type,  each  of  which  perforas  weighting  and  thresholding  functions.  For  axaapla, 
each  alaaent  could  conceivably  be  an  opt leal- lnput-optlcal-output  multiple  quantua  well 
( MQW)  gate,  and  all  alaaants  could  be  optically  interconnected  so  that  the  thresholding 
oparations  would  be  distributed  throughout  the  systaa.  In  contrast,  luapad  thrashold  logic 
systems  generally  have  only  two  functional  units,  one  for  weighting  and  one  for 
thrasholdlng.  For  axaapla,  the  weighting  operation  could  be  accoapllshed  by  passive  or 
active  (l.e.,  prograaaabla)  lntagratad  optical  diffracting  alaaants,  and  tha  thrasholdlng 
operation  could  be  accoapllshed  by  photo-detectors  at  an  optical-to-electronlc  output  Inter 
face.  In  this  case  the  (nonlinear)  thresholding  operation  is  global  or  concentrated  at  tha 
output  of  tha  systaa. 


Luapad  Thrashold  2-Blt  Multiplier 

Tha  2-blt  aultlpllar  aay  be  used  as  a  simple  axaapla  of  luapad  threshold  logic  design. 
Figure  2  le  a  2-bit  aultlpllar  truth  table  for  the  multiplication  of  binary  numbers  xj  xg 
and  yi  yg  to  obtain  zy  zj  zj  zg.  Suppose  that  the  four  input  bits  are  represented  by  0  If 
they  are  zero  and  by  x„  »  Aj  expUbx),  xg  -  A2  expllej),  yi  *  A3  exp(i#3),  and  yg  -  A4 
exp( 1#4>  If  they  are  ones.  If  these  expressions  correspond  to  waves  In  a  gaoaetrlcal  optic 
approxlaatlon  where  all  source-source,  source-detector,  and  detector-detector  distances  are 
large  coaparad  to  tha  wavelength,  than  the  2-blt  aultlpllar  may  be  designed  as  shown  In 
Figure  3a.  Hare  xg,  x^,  yg,  and  yx  are  optical  point  sources,  zg,  zx<  zj,  and  Z3  ara  point 
photodotactors.  Tha  lines  Indicate  optical  paths,  each  of  which  say  have  a  selected  atte¬ 
nuation  and  phase  shift  that  alght  ba  iaplaaentad  by  a  hologram  or  lntagratad  optical 
diffracting  alaaants. 
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The  required  attenuations  and  phase  shifts  nay  be  obtained  by  solving  sets  of  simulta¬ 
neous  nonlinear  inequalities  derived  from  the  truth  table.  For  example,  the  12th  row  and 
the  22  column  of  the  table  imply  a  signal  at  photodetector  Z2  that  must  equal  or  exceed  a 
threshold  T2: 

I A  j,  exp(i*i)  A3  exp(i$3)  *  A4  expds*)!2  >  T2  (1) 


Similar  expressions  may  be  obtained  so  that  each  of  the  four  output  columns  (labeled  z3,  z2. 
z\.  and  zq)  Is  described  by  a  set  of  16  (one  for  each  table  row)  simultaneous  nonlinear  ine¬ 
qualities  in  9  unknowns:  four  amplitudes  (A^.  A2.  A3,  and  A4) ;  four  phases  (61.  *2 ■  * 3 ■ 

*4);  and  one  threshold  (Ti,  I2 .  T3,  or  T4) .  Solutions  ars  to  be  found  for  each  of  these 
four  overdetermined  Inquality  sets  (which  involvs  terms  such  as  Aj2,  2AjA2cos( 6i-s2 ) ,  etc.) 
such  that  the  amplitudes,  phases,  and  thrssholds  obtained  all  have  acceptable  tolerances  or 
ranges  over  which  they  may  vary  without  affecting  proper  2-blt  multiplier  operation. 

One  solution  which  involves  only  phass  shifts  (no  attsnuatlons)  and  which  My  have  prac¬ 
tical  tolerancee  is  given  in  Figure  3b,  where  the  6  column  gives  the  phase  shifts  required 
for  each  of  the  four  paths  (in  order)  to  each  detector,  the  T  column  glvee  the  threshold 
value  for  each  detector  when  K\  ■  A2  ■  A3  •  A4  ■  1.  and  the  AT/K  column  gives  the  fraction 
of  the  total  signal  range  on  each  detector  over  which  its  threshold  My  vary.  Figure  4  is  a 
histogram  of  AT/R  for  output  z2  generated  by  selecting  each  of  the  four  phasee  for  this  out¬ 
put  randomly  from  norul  distributions  with  means  at  their  design  values  and  standard 
deviations  equal  to  aretan  (.1).  These  standard  deviations  correspond  to  10*  displacement 
of  the  phase  vectors,  and  Figure  4  shows  that  such  variations  reduce  the  threshold  tolerance 
aT/R  for  output  z2  from  37*  to  about  20*.  Similar  acceptable  tolerances  mmy  be  obtained  for 
the  other  outputs  and  should  be  amenable  to  engineering  design. 

Optical  Implementations 

The  2-blt  multiplier  design  described  above  la  based  on  the  ability  of  optics  to  provide 
noninterfering  interconnections3  which  (1)  are  parallel  in  that  interconnection  time  is 
essentially  independent  of  interconnection  length  or  weight  and  which  (2)  lead  to  system 
operation  tlMS  essentially  limited  only  by  the  response  tlMs  of  sources  or  detectors. 

These  interconnections  My  be  provided  by  passive  diffracting  eleMnts  in  the  form  of  an 
ordinary  or  bulk  thin  or  thick  film  hologram  in  which  light  propagates  approxlMtely  normal 
to  the  hologram  plane.  These  interconnections  My  also  be  provided  in  Integrated  optical 
Implementations  by  passive  diffracting  elements  formed  on  or  near  a  substrate  surface  such 
that  light  propagates  approxlMtely  parallel  to  the  surface.  Such  Integrated  optical  imple¬ 
mentations  could  use  surface  relief  or  photorefractlve  mechanisms  to  fora  the  diffracting 
elements  on  QaAs,  LllfbOs,  glass  or  other  substrates. 

Integrated  optics  has  potential  for  Implementing  lumped  threshold  computation  sodules 
with  superior  advantages  in  size,  power  consumption,  reliability,  etc.  This  technology  also 
has  potential  for  Implementing  real-time  prograaMble  interconnections  or  weightings  using 
electro-optlcally  modulated  diffracting  element  structures  (or,  ultlMtely,  all-optical 
nonlinear  devices).  This  capability  would  be  important,  for  exuple  for  neural  network 
architectures  that  could  perform  " intelligent"  adaptive  and  symbolic  processing.4  Figure  3a 
shows  a  direct  implementation  of  programmable  interconnections  using  a  segmented  array 
electro-optic  grating.9  Here  the  interconnection  shown  in  Figure  3a  are  reordered  so 
that  no  connection  paths  cross  by  providing  a  uniform  optical  input  and  applying  one  of  two 
voltages  to  the  grating  segments  in  accordance  with  the  input  bits  xi.XQ.yj,  and  yo-  Note 
that  individual  grating  segments  Si  are  tilted  at  angles  6i  so  that  parallel  input 
beau  Si  are  directed  to  a  detector  with  threshold  T  after  weight  W{  is  applied  as  deter¬ 
mined  by  programmable  voltages  Vj.  However,  since  there  must  be  a  minimum  deflection  angle, 
the  lateral  separation  0  must  become  large  as  the  number  of  segments  N  Increases.  Figure  Sb 
shows  one  way  of  circumventing  this  problem  using  an  integrated  optical  lens,  which  also 
permits  (1)  a  common  angle  $  for  all  grating  segments  and  (2)  the  elimination  at  a  stop  (or 
monitoring)  of  undiffracted  light.  Figure  6  shows  an  Integrated  electro-optical  channel- 
guide  implementation.  Here  the  channels  ars  addressed  through  horns,  and  phase  modulation 
is  provided  by  surface  elsctrodes.  The  output  horns  terminate  in  multimode  regions  whose 
outputs  are  plane  waves  which  impinge  on  a  surface  grating  at  the  Bragg  angle.  The  contri¬ 
butions  from  the  individual  horns  mix  in  the  grating  and  produce  an  output  which  is  a  func¬ 
tion  of  the  phase  differences  between  all  pairs  of  input  beams.  Advantages  of  this 
arrangement  are  compactness  and  isolation  of  the  detector  from  stray  light. 

Figure  7  showe  how  a  hologram  might  be  optically  generated  for  Implementing  certain 
lnpv  -output  relationships  or  truth  tables  (including  the  2-blt  multiplier  truth  table)  in  a 
lump  1  threshold  system.  One  possibility,  the  Fourier  transform  hologram  in  Figure  7a,  is 
multiply-recorded  using  object  sources  0  related  to  output  truth  table  elements  and 
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reference  sources  R  related  to  input  truth  table  elements.  Mote  that  these  sources  need  not 
be  evenly  spaced.  The  Fourier  transform  reconstruction  in  Figure  7b  generates  output  truth 
table  eleaents  A  given  input  truth  table  eleaents  C.  Using  standard  models  of  the 
holographic  process  it  aay  be  shown  that  the  UxP  output  truth  table  matrix  A  is  related  to 
the  NxP  input  truth  table  matrix  C  by 

A  -  OR*C  (2) 

where  0  and  R  are  LxM  and  NxM  matrices  describing  the  complex  amplitudes  used  in  recording 
the  M-fold  exposed  hologram,  a  ■  1,2,....  M,  p  ■  1.2.....P,  and  *  is  the  conjugate  transpose 
operation.  An  Important  aspect  of  Eq.(2)  is  that  although  many  exposures  aay  be  used  to 
record  the  hologram,  the  ability  of  the  hologram  to  represent  input-output  relationships  is 
described  by  no  aore  than  the  ML  complex  eleaents  of  OR*.  In  the  2-blt  multiplier,  for 
example,  where  W  *  L  ■  4  and  P  ■  16,  only  16  complex  parameters  are  available  to  relate  64 
input  bits  to  64  output  bits.  This  suggests  that  not  all  possible  truth  tables  are  reali¬ 
zable  in  an  optically  recorded  hologram  of  the  type  considered  here.  An  analogous  situation 
is  that  not  all  logic  functions  can  be  implemented  by  single  threshold  logic  elements.6 

It  would  be  useful  to  at  least  approximately  solve  Eq.(2)  for  OR*  in  terms  of  C  and  A. 
This  matrix  equation  is  generally  over  deteralned.  and  least-squares  or  pseudoinverse 
methods  might  be  used  to  obtain  an  approximate  solution.  The  (row  by  row)  least-squares 
solution,  for  example,  is 

OR*  -  AC*(CC*)-1  (3) 


While  this  solution  may  not  yield  the  desired  truth  table  realization  in  a  lumped  threshold 
system,  it  aay  serve  as  a  starting  point  for  a  steepest  descent  or  other  computer  search  for 
desired  solutions.  Such  solutions  should  maintain  the  desired  input-output  relationship 
when  the  matrix  eleaents  are  varied  over  an  acceptable  tolerance  range.  The  geometrical 
optics  phase-only  solution  for  the  lumped  threshold  2-blt  multiplier  described  in  Figure 
3b  is  such  a  solution  and  aay  be  used  to  derive  an  OR*  matrix  in  which  all  elements  have 
unit  magnitude: 


OR* 


1 

1 

(-7T 3  l)/4 

1 


(-l_+/31)/2  1  (-1  -/31)/2 

(/IS  *  i)/4  {/l3  ♦  1 )  /  4  (/IS  -*•  i)/4 

1  -1  1 


(4) 


A  particular  implementation  of  this  solution  for  holographic  recording  is  0  ■  the  4x4 
identity  matrix  and  R  »  (OR*)*.  Mote  that  although  the  above  analysis  implies  three- 
dimensional  holographic  systems.  Integrated  optical  assemblies  of  diffracting  elements  simi¬ 
lar  in  function  to  bulk  holograms  aay  be  practical.  This  possibility  is  related  to  the 
observation  that  the  multiple  truth  table  "images"  to  be  recorded  and  reconstructed, 
although  often  highly  cross-correlated,  aay  be  relatively  simple  or  low-resolution  bright- 
spot  -dark-spot  patterns. 


General  Lumped  Threshold  Module  Synthesis 

Optical  or  computer  generated  hologram  synthesis  of  the  weighting  or  interconnecting 
units  required  for  lumped  threshold  computation  modules  will  generally  require  knowledge  of 
the  amplitude  and  phase  patterns  on  the  hologram  that  yield  the  correct  truth  table  input- 
output  behavior  with  maximum  weight  and  threshold  tolerances.  In  the  case  of  the  geometri¬ 
cal  optics  two-bit  multiplier  design  of  Figure  3,  expressions  governing  input -output 
behavior  were  easily  obtained.  This  favorable  situation  say  be  uncommon  in  the  design  of 
the  generally  smaller,  sore  efficient,  etc.,  lumped  threshold  modules  for  which  geometrical 
optics  approximations  do  not  apply. 

Consider,  for  example,  the  derivation  of  eight  far-fleld  holograms,  each  with  the  same 
two  design  parameters,  that  implement,  in  a  lumped  threshold  system,  the  eight  positive- 
threshold  two- Boo lean  variable  functions  (l.e.,  the  eight  out  of  the  sixteen  functions  for 
which  two  zero  Inputs  yield  a  zero  output).  Figure  8  shows  a  simple  format  consisting  of  a 
screen  with  two  pinholes  separated  by  a  distance  y.  One  pinhole  is  covered  by  a  phase- 
shifting  film  9;  there  is  a  detector  d  and  lower  and  upper  mutually  coherent  point  sources  t 
and  u.  In  the  far-fleld  approximation  the  distances  b  and  y  and  the  wavelength  \  »  2*/k 
must  be  smell  compared  to  the  distance  s.  With  this  approximation  and  with  b  fixed,  the 
problem  reduces  to  finding  values  of  y  and  9  such  that  the  detected  signals  I|  for  only 
source  I  on,  Iu  for  only  source  u  on,  and  1^  for  both  sources  on  have  all  six  possible 
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inequality  relationships.  Referring  to  Figure  a  for  definitions,  the  following  approximate 
expressions  say  be  derived: 


A |  ■  exp  lfk(w  *  »)]  *  exp  i[k(v  ♦  rj  *  g] 
Au  «  exp  l[k(w  +  s)]  ♦  exp  l[k(u  +■  r)  *  a] 


r!  - 

U.I2 

*  (ks>: 

>(X2  ♦ 

ax)  2 

«■  2n(ks)  (x2 

ax)  *  n2 

(5) 

Iu  ” 

I  Aul2 

*  (ks) : 

*(X2  - 

ax) 2 

♦  2n(ks) (x2  ♦ 

ax)  +  n2 

7b  * 

1  A  | 

Au'l2  s 

2 (ks) ‘ 

*  fix2 

*  ax)2  ♦  (x2 

-  ax)<} 

♦  8n(ks)x2  ♦ 

4n2  - 

4(ks) 

2(ax)2  , 

where  x  *  y/s,  a  «  b/s,  and  n  »  8  -  n  *  0.  Figure  9  is  a  graph  of  the  approximate 
expressions  for  If  #  and  lb  versus  x  for  A  ■  628  na.  b  ■  10  us.  s  -  10  cm,  and  n  *  .00*. 

Note  that  four  of  the  six  inequality  relationships  can  be  satlefled  uelng  the  plotted 
values;  the  other  two  relatlonshlpe  can  be  satisfied  for  other  values  of  n- 

The  example  of  Figure  8  and  Eqs.  (9)  indicates  tha  possible  complexity  of  general 
(physical  optics)  lumped  threshold  module  synthesis.  Greater  complexity  may  be  anticipated 
if  Fresnel  rather  than  Fraunhofer  diffraction  conditions  are  allowed  and  if  the  input-output 
truth  tables  are  large.  One  approach  to  such  synthesis  problems  is  to  perform  additional 
post-photodetectlon  processing  and  to  employ  logical  reduction  and  residue  arithmetic  tech¬ 
niques  to  reduce  the  effective  elze  of  the  truth  tables  to  be  realized.7  The  approach  con¬ 
sidered  here  seeks  alternatives  to  requirements  for  conversion  into  and  out  of  residua 
arithmetic  and  for  additional  all-electronic  processing. 

Further  work  on  lumped  threshold  logic  should  emphasize  studies  of  the  types  and  sizes  of 
realizable  truth  tables  and  should  seek  general  methods  for  synthesizing  holograms  such  that 
the  required  truth  tables  are  realized  with  acceptable  threshold  and  other  tolerances.  A 
straightforward  but  perhaps  limited  approach  to  those  studies  is  to  investigate  the  number 
of  holograms  with  specified  resolution  and  cross  correlation  characteristics  that  can  be 
multiplexed  on  a  single  recording  medium.  A  sore  general  approach  say  be  to  obtain 
expressions  -  generally  large  sets  of  overdetermlnsd  simultaneous  nonlinear  inequalities  - 
that  fully  describe  a  desired  optically  implemented  lumped  threshold  modulo  and  to  find 
optimal  solutions  for  them  using  nonlinear  programming  techniques.  This  approach  will  pro¬ 
bably  require  the  use  of  supercomputer  facilities,  but  in  many  casss  it  may  be  the  best 
approach  for  obtaining  designs  with  optimum  performance  characteristics. 
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Abstract 

Parallelism  and  global  connections  are  the  main  features 
of  optical  computers.  The  computation  power  of  these  features  is 
quantified  using  a  new  measure  based  on  the  degrees  of  freedom 
that  are  available.  This  measure  of  computation  power  is  then 
related  to  the  versatility  of  the  computer  and  the  complexity 
level  of  the  problems  it  can  tackle.  The  practical  significance 
is  demonstrated  by  optical  implementation  of  FPLA's  and 
associative  memories.  The  quantitative  measure  and  the 
methodology  are  equally  applicable  to  general  non-optical 
computing  systems. 
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I .  Introduction 


In  contrast  with  the  digital  world,  it  is  much  easier  to 
achieve  global  communication  than  to  implement  sophisticated 
local  computation  in  optical  systems.  In  fact,  a  tremendous 
level  of  parallelism  can  be  achieved  with  the  simplest  form  of 
optical  systems.  The  power  of  parallelism  and  global 
communication  is  therefore  the  essential  merit  for  building 
optical  computers,  and  it  is  essential  to  assess  this  power  in 
quantitative  terms.  It  may  seem  at  first  glance  that  a  measure 
of  the  computation  power  of  global  interconnections  without 
regard  to  what  kind  of  computing  elements  are  being 
interconnected  has  no  real  significance,  since  the  situation  is 
different  if  for  example  we  are  connecting  Cray  computers  or  AND 
gates.  However,  if  we  are  able  to  assess  the  "contribution"  of 
interconnections  to  the  computation  power  of  the  whole  system,  we 
will  have  a  useful  measure  of  their  role.  In  this  paper,  we 
define  such  a  measure  based  on  the  computational  degrees  of 
freedom,  apply  it  to  optical  systems  to  assess  their  power, 
interpret  this  power  in  terms  of  optical  implementation  of  FPLA's 
and  associative  memories,  and  argue  that  the  measure  is  not 
restricted  to  optics  but  is  equally  valid  in  a  general  computing 
system. 

In  section  II,  we  familiarize  the  reader  with  optical 
computers  and  hybrid  systems  of  optical -digital  computers.  In 
section  III,  we  discuss  computation  power  briefly  and  quote 
some  results  relating  the  versatility  of  a  computing  system  to 
its  ability  to  tackle  problems  of  a  certain  level  of  complexity. 
The  measure  of  computation  power  is  introduced  and  discussed  in 
section  IV,  and  then  applied  to  optical  computers.  In  section  V, 
two  classes  of  examples  are  discussed  to  show  how  the  measure  of 
computation  power  is  directly  reflected  in  practical  situations. 

n.QBfclgfll  Computers 

A  schematic  diagram  of  a  generalized  optical  information 
processing  system  is  shown  in  Figure  1.  It  consists  of  two 
planes  which  are  optically  interconnected.  The  interconnection 
pattern  is  determined  by  the  optical  system  that  is  placed  in  the 
intervening  space.  If  optical  feedback  is  used  the  two  planes 
can  be  physically  merged  into  one  and  two  way  communication  is 
possible.  Analog  signal  processing  can  be  performed  with  the 
general  structure  of  Figure  1  by  assigning  analog  weights  to 
each  interconnection  between  individual  pixels  at  the  input  and 
output  planes.  Accumulation  of  the  weighted  samples  at 
the  output  by  either  spatial  or  temporal  integration  results  in 
the  implementation  of  linear  transformations  of  the  data  placed 
at  the  input  plane.  The  implementation  of  nonlinear  optical 
computers  has  also  been  proposed  [1,2,3]  using  the  same  basic 
structure  shown  in  Figure  1.  In  such  systems,  nonlinear 
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computations  are  performed  at  the  two  planes  by  either 
optoelectronic  semiconductor  circuits  [3]  or  nonlinear  optical 
components  [1,2].  In  either  case,  the  basic  structure  of  the 
optical  computer  is  very  similar  to  an  analog  signal  processor, 
however  there  is  a  very  important  shift  in  emphasis  and  outlook. 
In  an  optical  computer,  the  elementary  computations  are  being 
performed  at  the  two  planes  and  the  primary  purpose  of  the 
optical  system  is  to  provide  global  communication  among  the 
computing  elements.  This  is  obviously  true  in  the  case  of 
optical  interconnection  of  electronic  processing  components  [3], 
but  it  is  equally  true  if  optical  logic  elements  [4]  are  used, 
since  little  or  no  advantage  over  electronics  can  be  derived  by 
configuring  an  optical  logic  circuit  in  a  planar  geometry. 

An  optical  computer  in  principle  combines  the  best  of  two 
worlds.  Planar  technology  {VLSI  or  optical)  is  used  to  perform 
non-linear  logic  and  the  third  dimension  is  used  to  provide 
optical  communication  between  the  computing  elements  in  the 
plane.  indeed,  the  primary  motivation  for  the  proposed 
development  of  optical  computers  has  been  the  solution  of 
existing  and  anticipated  communication  bottlenecks  in  VLSI  [3,1]. 
The  ability  to  configure  the  interconnection  pattern  in  the  third 
dimension  is  a  unique  property  of  optics  and  it  does  not  only 

provide  an  efficient  means  for  parallel  communication  among 
computing  elements  that  are  well  separated  in  the  plane,  but  also 
it  makes  global  communication  among  all  the  computing  elements  in 
principle  possible.  In  an  optical  computer  of  this  type,  the 
computing  elements  that  are  being  interconnected  perform 
relatively  simple  computations  since  only  relatively  local 
interconnections  are  possible  with  a  planar  technology.  The 
assessment  of  the  potential  power  of  optical  computers,  reduces 
then  to  an  assessment  of  the  relative  importance  of  the 
sophistication  of  the  computing  elements  versus  the  communication 
capability  among  the  elements.  For  illustration  purposes  we 
present  two  hypothetical  curves  in  Figures  2a  and  2b.  The 
vertical  axis  is  the  computational  power  of  a  system  consisting 
of  N  parallel  computing  elements,  plotted  as  a  function  of  the 
number  of  interconnections  among  the  elements.  Several  curves 
are  drawn  in  each  diagram,  corresponding  to  different 
computational  power  of  the  individual  computing  elements. 
Figures  2a  is  drawn  under  the  hypothesis  that  the  relative 
importance  of  the  communication  capability  is  dominant;  as  the 
number  of  interconnections  is  increased,  the  computational  power 
of  the  system  rises  and  for  large,  fully  interconnected  networks 
the  computing  power  of  the  individual  elements  only  marginally 
affects  the  overall  system  for  large  N.  The  opposite  situation 
is  depicted  in  Figure  2b.  Interconnecting  a  large  number 
of  simple  elements  does  not  result  in  a  powerful  system  and  a 
locally  interconnected  network  of  very  powerful  computers  is 
essentially  the  best  that  can  be  done.  If  in  reality  the 
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situation  is  as  shown  in  Figure  2a,  then  we  can  be  very 
optimistic  about  the  prospects  of  optical  computers.  In  this 
paper,  we  express  our  conviction  that  indeed  the  situation  is  as 
depicted  in  Figure  2a  and  present  quantitative  arguments  in 
support  of  this  conjecture. 

hi.  computation  2szut i 

Computation  power  (the  variable  plotted  in  Figure  2)  is  a 
quantity  that  must  encompass  two  aspects.  The  first  is  what  we 
call  raw  computing  powers  the  maximum  number  of  elementary 
operations  that  can  be  performed  per  unit  time.  The  main  factors 
affecting  the  raw  computing  power  are  technological  (switching 
speed  for  instance),  and  parallelism.  On  the  other  hand,  a 
meaningful  measure  of  computation  power  must  include  the  ability 
of  the  computer  to  handle  complex  problems,  and  it  is  important 
to  make  the  distinction  between  large  problems  and  complex 
problems.  Forming  the  inner  product  between  two  large  vectors, 
for  example,  is  a  problem  of  very  low  computational  complexity 
since  only  one  operation  per  vector  element  is  required.  Clearly 
parallelism  can  help  solve  this  problem  faster  since  products 
between  the  individual  elements  of  the  vectors  can  be  separately 
formed;  global  interconnections  are  not  essential  since  the 

products  can  be  added  pairwise.  How  parallelism  can  be  help 
solve  a  problem  of  inherent  high  complexity  is  not  equally  clear. 
A  complex  problem  has  the  property  that  local  decisions  cannot  be 
made  until  essential  information  has  been  communicated  from 
basically  the  entire  input  data  {5].  Thus  useful  computation  can 
progress  only  when  all  the  input  information  has  been  considered 
by  the  individual  elements.  For  a  parallel  processor,  this 
implies  that  all  partial  results  need  to  be  globally 
communicated.  It  is  this  notion  that  forms  the  basis  for  our 
conviction  that  communication  capability  becomes  the  dominant 
factor  in  determining  the  computation  power  of  a  highly  parallel 
computing  structure,  rather  than  the  capabilities  of  the  local 
computing  elements. 

Complex  problems  of  this  type  arise  in  disciplines  such  as 
pattern  recognition,  and  are  characterized  by  the  lack  of  a 
regular  structure  that  would  admit  a  short,  systematic  algorithm 
for  solution.  Instead,  the  problem  has  to  be  broken  into  a 
very  large  number  of  basically  unrelated  cases  that  must  be 
considered  in  order  to  solve  the  problem.  A  convenient  measure 
of  the  complexity  is  the  logarithm  of  the  number  of  cases,  namely 
the  entropy  H  of  the  problem  [5].  The  entroy  of  the  problem  is 
clearly  related  to  the  size  of  the  system  that  can  handle  it, 
since  the  system  must  keep  track  of  all  the  cases  and  the  number 
of  these  cases  is  exponential  in  the  entropy.  Indeed,  a  direct 
relation  between  the  entropy  H  and  the  cost  C  of  the  system  is 
derived  in  [5].  On  the  other  hand,  the  number  of  problems  that 
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can  be  handled  by  a  system/  by  proper  programming,  is  also 
directly  related  to  its  size.  For  example,  if  the  memory 
capacity  is  large,  the  number  of  programs  and  look-up  tables  that 
can  be  stored  is  also  large,  and  hence  the  number  of  different 
problems  that  can  be  tackled.  Through  these  two  relations,  we 
can  link  the  number  of  problems  that  can  be  tackled  to  the  level 
of  complexity  (or  entropy)  that  can  be  handled  [5],  In  other 
words,  a  computing  structure  that  solves  a  certain  class  of 
complex  problems  also  solves  a  large  number  of  different 
problems,  and  vice  versa. 

When  we  interpret  these  relations  in  terms  of  optical 
computers,  a  definition  of  "size"  or  "cost"  is  needed.  On  the 
one  hand,  the  definition  should  be  independent  of  the  particular 
technology  in  question,  i.e.,  it  should  be  equally  applicable  to 
any  other  technology  that  yields  computing  systems.  On  the  other 
hand,  it  should  have  direct  practical  relevance  to  the  ability  of 
the  system  to  do  computation.  In  what  follows  we  introduce  the 
number  of  degrees  of  freedom  in  a  computing  circuit  as  the 
appropriate  measure  for  computation  power  and  assess  the 
computation  power  that  is  associated  with  optical 
interconnections  based  on  this  definition.  The  examples  that 
follow  will  demonstrate  the  computation  power  associated  with  the 
degrees  of  freedom  in  terms  of  the  number  of  problems  as  well  as 
the  complexity  of  problems  that  can  be  handled. 

IV.  Dgg.rees  ol  Freedom 

Whereas  the  speed  of  computation  and  the  size  of  the 
problems  that  can  be  handled  by  a  device  are  indeed  important 
factors  in  its  computation  power,  the  versatility  of  the  device 
becomes  the  essential  factor  of  computation  power  when  we  address 
general-purpose  computation.  If  a  device  does  one  very 
complicated  computation  task  very  quickly,  it  could  still  be  very 
hard  to  embed  an  arbitrary  computation  problem  in  this  device. 
For  example,  the  Fourier  transform  and  linear  filtering  of  two- 
dimensional  functions  come  very  naturally  in  optical  systems. 
These  are  not  trivial  operations,  they  are  considered  more 
difficult  than  simple  logic  operations  for  example.  However,  it 
is  not  as  easy  to  implement  a  wide  class  of  these  simple  logic 
operations  in  optical  systems. 

We  are  concerned  here  with  quantifying  the  versatility  of  a 
computation  device  as  a  measure  of  its  computation  power.  After 
defining  this  measure,  we  apply  it  to  optical  connections  and 
hence  measure  the  power  of  this  single  most  important 
computational  feature  of  optical  systems.  Once  we  arrive  at  this 
quantitative  measure,  we  demonstrate  that  we  can  get  as  much 
computation  from  optical  connections  as  we  should  expect  from  a 
general  device  that  has  the  same  value  of  this  measure  of 
computation  power. 
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A  computation  device  can  be  "programmed"  to  be  in  one  out  of 
a  number  of  possible  states.  After  programming  the  device,  it  is 
in  the  state  for  solving  a  specific  problem.  The  number  of 
possible  states  of  a  device  is  a  measure  of  the  number  of 
different  problems  it  can  possibly  handle.  For  example,  a  Field 
Programmable  Logic  Array  (FPLA)  can  be  programmed  to  simulate  any 
of  a  large  number  of  Boolean  functions.  To  program  it  for  a 
certain  function,  we  preset  a  number  of  internal  parameters  thus 
defining  which  function  we  are  simulating.  The  number  of 
parameters  under  our  control  determines  the  number  of  ways  we  can 
program  the  chip,  hence  the  number  of  functions  we  can  simulate. 
These  parameters  constitute  degrees  of  freedom  for  computation 
versatility. 

Given  a  device  to  be  used  as  a  component  in  a  computation 
system,  we  define  the  computation  power  c  of  that  device  to  be 
the  number  of  binary  parameters  that  can  be  set  independently  to 
fix  the  characteristics  of  the  device,  i.e.,  the  degrees  of 
freedom.  In  the  general  case  of  non-binary  or  dependent 
parameters,  the  measure  is  given  by  the  logarithm  to  base  2  of 
the  number  of  different  ways  in  which  the  device  characteristics 
can  be  fixed.  For  example,  a  Programmable  Read-Only  Memory 
(PROM)  with  n  address  lines  and  one  data  line  has  2n  memory 
locations  each  of  which  can  contain  1  or  0.  Each  pattern  of  l's 
and  0's  corresponds  to  a  distinct  Boolean  function  when  the 
address  lines  are  considered  as  the  input  Boolean  variables  and 
the  data  line  is  considered  as  the  output  Boolean  function. 
Therefore,  the  computation  power  c  of  this  PROM  is  2n.  This 
power  is  reflected  by  simulating  a  large  number  of  functions 

as  well  as  certain  functions  of  high  complexity.  It  is  quite 
general  that  the  ability  to  solve  complex  problems  is  associated 
with  the  ability  to  solve  a  large  number  of  problems  [5], 

A  general  device  having  c  =  N  can  be  thought  of  as  having  N 

"cells"  which  we  can  load  with  either  0  or  1  thus  fixing  the 

characteristics  of  the  device  in  a  unique  manner.  How  does  this 
definition  apply  to  optical  connections?  consider  two  planes 
each  containing  MxM  pixels  (Figure  1),  and 
a  hologram  that  determines  which  pixels  of  the  first  plane  are 
connected  to  which  pixels  of  the  second  plane.  Each  connection 
can  be  either  present  or  absent  independently  of  the  others,  and 
there  are  M  2jc  M  2  such  connections  possible.  There,  the 
computation  power  of  optical  connections  between  these  two  planes 
is  given  by  c  =  M^.  In  the  case  of  weighted  connections,  the 

computational  impact  of  the  weights  can  be  incorporated  in  the 

planar  logic  operations  to  be  performed.  This  impact  has  been 
found  experimentally  to  be  limited  [9],  which  is  expected  from 
results  in  threshold  logic  which  show  this  to  be  true  unless  the 
technology  can  accommodate  an  exponential  dynamic  range  of 
weights  [7]. 
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For  M  =  512,  the  value  of  c  is  2^6.  to  appreciate  this 
value,  let  us  compare  it  to  a  PROM  with  64K  bytes  of  memory.  in 
this  case,  we  have  216xB  degrees  of  freedom,  hence  c  =  219.  The 
optical  connections  are  therefore  as  powerful  as  approximately 
130,000  such  PROMs.  It  is  clear  that  using  these  PROM's,  one  can 
implement  some  functions  of  very  high  complexity.  Hence,  the 
hologram  should  be  able  to  play  a  computational  role  of  the  same 
sophistication.  We  now  demonstrate  this  role  by  example. 


It  is  clear  that  c  imposes  an  upper  bound  on  the  versatility 
of  a  device,  since  we  cannot  do  more  than  what  the  degrees  of 
freedom  allow  us  to  do.  However,  it  is  yet  to  be  determined 
whether  we  can  put  these  degrees  of  freedom  to  work  for  us  in  a 
general  computation  task.  In  this  section,  we  address  the 
usefulness  of  optical  connections  as  we  incorporate  them  in  a 
system  designed  to  solve  a  large  class  of  problems,  we  shall 
discuss  the  optical  implementation  of  two  different  examples  of 
fairly  general  structure  and  computational  usefulness,  in  which 
optical  connections  play  the  central  role.  These  are  Boolean 
functions  and  associative  memories. 

Consider  a  Boolean  function  which  has  a  sum-of-products 
expansion  (Boolean  expression  formed  by  ORing  terms,  each  being 
an  ANDing  of  Boolean  variables  or  their  negations  [ B ] )  with  a 
relatively  small  number  of  terms.  This  is  a  variation  of  low- 
entropy  functions  [5]  which  include  most  pattern  recognition 
decisions.  For  example,  we  fix  a  large  number  of  independent 
Boolean  variables,  say  n,  and  consider  those  functions  of  n 
variables  which  can  be  expressed  as  the  sum  of  at  most  218 
product  terms.  Using  optical  connections,  we  can  implement  any 
such  function  by  varying  only  the  hologram  within  an  otherwise 
simple  fixed  architecture.  In  other  words,  all  the  degrees  of 
freedom  come  from  optical  connections. 

The  architecture  follows  Figure  1.  The  first  plane  will 
have  3n  pixels  of  unit  intensity,  n  of  these  correspond  to  the  n 
variables  and  will  be  on  if  the  variables  assume  the  value  1, 
another  n  pixels  correspond  to  the  negations  of  the  variables, 
while  the  extra  n  pixels  will  be  always  on.  The  second  plane 
consists  of  a  512x512  threshold  elements  with  fixed  threshold  at 
n  -  1/2  units  of  intensity.  The  simulated  function  will  have  the 
value  1  if  one  or  more  threshold  elements  are  excited.  The 
hologram  is  designed  to  simulate  the  given  function  by  assigning 
a  product  term  to  each  threshold  element  in  the  second  plane. 
Each  literal  (variable  or  negated  variable)  appearing  in  each 
product  term  is  connected  to  the  corresponding  threshold  element, 
and  if  the  number  of  literals  in  the  product  term  is  less  than  n, 
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some  always-on  pixels  are  connected  to  the  element  to  make  the 
total  number  of  connections  equal  to  n. 

This  implementation  may  not  be  the  most  efficient,  but  it 
illustrates  exactly  the  title  of  this  article.  Without  the 
hologram,  the  system  is  a  simple  repetitive  structure  with  no 
problem  dependency  and  thus  no  capability  of  doing  any  real 
computation.  The  optical  connections  simulated  by  the  hologram 
fully  capture  the  computation  aspect.  A  simple  enumeration 
argument  will  show  that  a  huge  number  of  degrees  of  freedom  is 
indeed  required  to  simulate  these  functions.  This  example,  and 
more  complicated  versions  thereof,  demonstrate  the  utilization  of 
the  degrees  of  freedom  of  optical  connections. 

Another  example  that  demonstrates  the  computing  power 
afforded  by  the  degrees  of  freedom  of  optical  interconnects  in 
the  implementation  of  the  nearest  neighbor  search  operation 
according  to  the  model  that  was  described  by  Hopfield  [6]  for 
neural  networks.  The  basic  architecture  is  again  as  in  Figure  l 
except  that  feedback  is  used  from  the  output  back  to  the  input 
[9],  The  computing  elements  at  the  output  plane  are  threshold 
elements,  one  at  each  pixel.  An  interconnection  between  the  ifch 
pixel  at  the  input  and  the  pixel  at  the  output  is  made  if 
^  Vim.vjm>  o*  where  vim  are  binary  words,  n  bits  long  each, 

m 

that  are  stored  in  the  system.  When  the  initial  state  of  the 
system  is  set  according  to  an  external  stimulus,  the  state  of  the 
system  generally  converges  to  the  stored  binary  vector  that  is  at 
the  shortest  Hamming  distance  from  the  initial  state.  Thus  the 
system  performs  a  nearest  neighbor  search,  a  fundamental 
operation  for  pattern  recognition,  associative  memories  and  error 
correction.  it  is  possible  to  implement  this  model  with 
electronic  components,  using  locally  interconnected,  pipelined 
(systolic)  multipliers/  accumulators  for  simulating  the 
interconnection  matrix  and  an  array  of  thresholding  elements. 
Thus  the  global  communications  are  not  essential.  However,  the 
electronic  implementation  requires  n2  locally  connected  elements 
in  order  to  produce  a  product  vector  in  each  cycle,  whereas  the 
optical  implementation  requires  only  n  very  simple  computing 
elements  and  n2  communication  links.  The  overall  number  of 
degrees  of  freedom  is  che  same  in  both  cases,  demonstrating  that 
each  optical  connection  in  this  example  contributes  the  same 
computing  power  to  the  overall  system  as  an  individual  computing 
element. 

In  general,  the  dramatic  increase  in  the  degrees  of  freedom 
that  are  created  by  optically  interconnecting  a  large  number  of 
computing  elements  in  a  planar  structure  can,  in  principle,  be 
translated  to  a  proportional  increase  in  overall  computing  power. 
Two  examples  were  given  that  exemplify  specific  methods  for 


tapping  this  potential,  but  the  challenge  still  remains  to 
develop  algorithms  that  are  appropriate  for  computing  structures 
of  this  type  and  solve  technological  problems  that  will  make 
large  networks  of  optically  connected  computing  elements 
feasible. 
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ABSTRACT 


The  research  identified  a  bottleneck  in  electronic  image  processing 
which  is  the  large  number  of  multiplications  and  additions  required,  and 
proposed  solutions  using  optical  parallel  multiplication  and  addition. 

One  technique  utilized  a  film  matrix  multiplier  and  an  array  of  LED's,  and 
the  second  system  employed  a  single  channel  acoustooptic  cell  and  demonstrates 
near-optimal  two  dimensional  data  flow.  A  technique  for  assessing  the 
advantages  and  disadvantages  of  optical  processors  is  also  presented,  with 
examples  such  as  an  equally  fast  and  more  accurate  electronic  implementation 
of  an  optical  matrix-vector  multiplier  which  accomplishes  an  entire  fourier 
transform  in  parallel. 


TECHNICAL  SUMMARY 


The  objective  of  the  research  was  to  investigate  optical  systems  capable 
of  performing  two  dimensional  neighborhood  operations.  The  objective  arises 
due  to  the  increased  need  for  image  processing  and  pattern  recognition  as 
applied  to  vision  systems  for  robotics,  autonomous  vehicles,  reading  machines, 
and  especially  the  SDI  surveillance  requirements  which  would  completely  over¬ 
whelm  present  electronic  techniques. 


2 .  Description  of  Work  Performed  and  Results 

The  work  was  accomplished  in  two  cycles  of  analysis  and  invention.  During 
the  first  cycle  an  analysis  of  the  weaknesses  present  in  electronic  image 
processors  was  made, followed  by  the  invention  of  an  optical  technique  which 
accomplished  the  same  general  task  but  with  a  higher  degree  of  parallelism. 

The  device  included  LED's  to  represent  image  pixel  intensities  and  a  film 
mask  used  as  the  multiplying  filter  weights. 

The  second  cycle  in  the  research  involved  analysis  of  the  new  LED-Mask 
device  including  a  serious  comparison  with  an  electronic  analog  of  the  optical 
system.  Another  optical  device  described  in  the  literature  was  similarly 
analyzed  by  comparing  it  to  an  electronic  analog  invented  for  the  sole  purpose 
of  properly  assessing  the  actual  advantages  and  disadvantages  inherent  in 
using  optics  in  the  design.  In  both  cases  the  electronic  analogs  of  the  optical 
devices  were  found  to  be  superior  to  the  optical  system  in  several  aspects, 
equivalent  on  some  issues  and  inferior  on  only  one  point. 

The  last  piece  of  the  research  was  the  invention  of  a  2-D  Convolution 
Filter  using  an  acousto-optic  cell  which  accomplishes  several  of  the  objectives 
in  speed,  data  flow  efficiency,  accuracy,  programmability  and  simplicity. 

The  area  of  a  2-D  filter  that  is  implemented  can  also  be  expanded  without 
affecting  the  speed, and  with  small  changes  in  the  data  flow  efficiency.  This 
acoustooptic  cell  device  requires  a  minimum  of  additional  components  including 
a  single  diode  laser,  one  lens,  two  detectors  and  aifilm  mask. 


3. 


Conclusions  and  Recommenda  t ions 


The  work  was  accomplished  in  cycles  of  analysis  and  invention  which 
facilitated  unveiling  the  true  advantages  and  disadvantages  of  several 
optical  signal  processing  systems.  The  optical  devices  that  we  proposed 
were  also  subjected  to  the  same  analysis.  The  result  is  that  we  have 
presented  an  analysis  technique  for  future  optical  processing  schemes,  as 
well  as  a  useful  optical  system  that  implements  a  large  area  two  dimensional 
convolution  filter. 

The  need  for  efficient  image  processing  will  continue  to  increase  as 
is  evidenced  by  the  almost  nonexistence  of  visual  senses  in  robotics,  by 
the  recent  initiation  of  reading  machines  and  initial  tests  for  autonomous 
vehicles,  and  by  the  vast  data  requirements  for  artificial  intelligence 
applications  such  as  missile  tracking  and  identification.  We  encourage  more 
research  leading  to  generic  2-D  signal  processing  because  it  is,  in  fact, 
now  a  bottleneck  in  several  fields. 

We  also  recommend  evaluating  the  advantages  and  disadvantages  of  optical 
f  processors  by  inventing  an  electronic  analog  of  the  system,  which  will  assist 

in  identifying  which  of  the  features  of  the  optical  system  are  in  fact  specificall 
due  to  the  optics.  Further  we  recommend  that  familiarity  with  digital 
electronic  implementations  will  encourage  a  cross  breeding  of  ideas  with 
the  analog  electronic  and  optical  processing  systems. 
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APPLICATIONS  OF  THE  OPTICAL  NEIGHBORHOOD  OPERATOR 


Both  linear  and  nonlinear  neighborhood  processors  will  be  of  interest 
to  SDI  for  image  manipulation  if  the  operations  can  be  fast  enough  to  meet 
the  realistic  needs  and  if  the  computer  satisfies  other  needs  such  as  gmai 1 
cost,  size,  weight,  and  power  consumption.  Uses  of  neighborhood  operations 
are  either 

•  linear  (sometimes  called  convolution,  usually  with  Finite 
Impulse  Response  or  FIR  filters)  or 

•  nonlinear  (including  shrink/expand  and  median  operations). 

We  will  concentrate  on  linear  operations  which  can  range  from  simple 
"derivative"  operations  such  as  the  Sobel  kernels 

-10  1  12  1 
-2  0  2  and  0  0  0 

-101  -1  -2  -1 

to  convolution  with  pattern  recognition  templates. 

This  is  a  powerful  technique  which  has  inspired  the  production  of  several 
specialized  ELU's  (electronic  processing  units).  Of  these,  by  far  the  fastest 
are  pipelined  machines  which  flow  the  neighborhood  (n  x  n,  n  odd)  data  into  the 
ELU  in  a  sequential  manner  so  that  calculating  the  next  pixel  requires  dropp¬ 
ing  n  data  and  adding  n  data.  The  fastest  of  these,  the  cytocomputer  requires 
only  100  nsec  per  new  pixel  for  n*3.  Thus  for  TV  resolution  (0.25  x  10^  pixels), 
we  obtain  a  frame  time  of 

M  0.25  x  10^  pixels  _  10  ^  sec 
F  frame  pixel 

■  0.025  sec/frame. 


That  is,  Che  biggest  and  best  is  roughly  "real  time",  but  only  for  small  images 
(500  x  500)  and  the  smallest  meaningful  window  (3x3).  Calculation  time  scales 


?  ? 

as  N"n“  for  an  N  x  N  image  and  an  n  x  n  window.  Furthermore  we  may  need  to 
cascade  neighborhood  operations  M  times,  where  M  a  1  to,  say,  20.  Thus 
the  total  image  processing  time  is 

x  2  2 

tT  -  MNn  t  , 

I  o 

where  tQ  is  the  basic  operation  time.  Our  dual  objectives  are  to 

2  2 

•  reduce  the  multiplier  N  n  by  increasing  parallelism  and 

•  reduce  tQ  by  going  to  optics. 

It  is  likely  that,  in  the  process,  we  will  also  improve  on  electronics  in  size, 

weight,  cost,  and  power  consumption.  It  is  the  speed  increase,  however,  which 

2 

should  have  the  greatest  impact  on  SDI,  by  allowing  bigger  Images  (larger  N  ) 
to  be  processed  by  more  sophisticated  algorithms  (larger  n  and  M) . 

As  currently  designed  our  system  will  sacrifice  a  little  in  numerical 
accuracy  relative  to  its  electronic  counterpart  by  operating  analog  rather 
than  16  bit  fixed  point.  Renormalization  after  each  of  the  M  cycles  will 
largely  mitigate  this  problem. 

So  far,  this  introduction  has  emphasized  our  approach's  advantages 
relative  to  electronics.  We  conclude  the  introduction  by  noting  its  advantage 
relative  to  most  optical  "computers'*.  That  advantage  is  that  it  can  be  built 
now  with  available  components  to  perform  tasks  beyond  current  and  foreseeable 
electronic  capabilities.  No  further  invention  is  required.  No  special 


components  need  to  be  developed. 
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TECHNICAL  DISCUSSION 


Introduction 

I  - 

t 

Two-dimensional  signal  processing  is  a  fundamental  part  of  vision  systems 
and  is  recognized  as  a  speed  bottleneck  in  contemporary  pattern  recognition 
where  matched  filters  and  a  variety  of  2-D  transforms  are  needed.  The  follow¬ 
ing  technical  discussion  is  therefore  focused  on  a  few  2-D  filtering  systems 
and  gives  comparisons  in  terms  of  analog  electronic  versions  using  the  same 
techniques.  Finally  a  2-D  convolution  filter  using  an  acoustooptic  cell  is 
presented  that  closely  approaches  the  ultimate  data  flow  efficiency  of  one 
output  data  point  per  input  image  pixel. 

A.  LED-Mask  Neighborhood  Operator 

Figure  1  shows  nine  LED's  with  a  film  filter  mask*  a  collection  lens, 
and  a  detector  which  multiplies  the  LED  output  (image  pixels)  by  the  film 
transmittance  (filter  weight)  and  collects  the  products  on  a  single  detector. 

The  advantage  of  this  device  is  in  providing  instantaneous  multiplication 
and  addition.  The  addition  feature  is  a  "multiplexing"  aspect  of  optics.  No 
multiplexing  is  available  at  the  input  however,  as  one  T.ED  is  required  per 
input  pixel.  Further  the  LED’s  must  be  driven  by  individual  linearized 
current  sources  that  store  the  pixel's  levels.  This  means  that  a  data  shift¬ 
ing  mechanism  is  required  on  the  input.  This  system  is  programmable  by 
replacing  the  film  filter  function  and  has  limited  accuracy  due  to  the 
characteristics  of  the  LED's. 

Figure  2  shows  an  analog  electronic  version  of  the  LED-Mask  device  using 
individual  resistors  as  the  filter  function,  individual  voltage  sources  as  the 
input  image  pixel  levels,  and  an  output  current  signal  which  is  a  sum  of  products 
This  system  is  also  essentially  instantaneous  in  multiplication  and  summing. 

The  addition  occurs  due  to  the  multiplexing  ability  of  current  in  wires  which 
are  tied  together.  No  multiplexing  occurs  at  the  input  where  individual  voltage 
sources  are  required  that  store  the  pixel  values.  Much  higher  accuracy  is 
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possible,  however,  due  Co  Che  elimination  of  che  nonlinear  LED's  and  Che 
linear  buc  noc  necessarily  prediccable  opcical  efficiency  of  che  parallel 
opcical  pachs.  The  resiscor  mulciplier  is  noc  programmable  unless  digital- 
co-analog  convercers  are  used,  which  do  noc  add  significanc  cose  or  complexicy 
ac  chis  cime. 

In  this  previous  case  an  analog  electronic  implementation  had  several 
advantages  over  Che  analog  optical  system,  except  possibly  in  the  question 
of  programmability.  It  is  evident  that  the  multiplexing  aspect  of  optics 
was  not  limited  only  to  optics  in  this  case,  as  the  electrical  version  was 
equally  effective.  We  also  observe  that  the  lack  of  multiplexing  ac  che  LED 
input  side  of  the  optical  device  is  inefficient, as  it  also  is  the  electronic 
implementation . 

A  digital  implementation  of  the  neighborhood  operator  utilizes  a  digital 
multiplier  and  adder.  The  use  of  several  multipliers  to  speed  up  the  operation 
leaves  the  digital  addition  function  as  the  bottleneck.  Analog  addition 
whether  optical  or  electronic  is  very  fast  and  may  be  preferable  in  this  case, 
if  analog  electronic  accuracy  is  not  a  problem  in  the  application.  One 
implementation  of  this  mixed  analog-digital  idea  would  use  digital-to-analog 
converters  as  multipliers,  and  collect  the  products  by  summing  the  output 
currents . 

Analysis  of  the  LED  driven  optical  neighborhood  operator  has  revealed 
several  issues.  First  that  the  fast  addition  aspect  of  analog  optics  has 
its  equivalent  in  analog  electronic  addition.  Second  that  linearizing  LED's 
is  not  as  accurate  and  may  be  as  complicated  as  using  D/A  converters  in  the 
analog  electronic  version,  which  makes  it  more  easily  programmed  than  using 
a  replaceable  filter  function  made  of  film.  Third  that  the  present  digital 
electronic  competition  has  accurate  digital  multipliers  with  ever  increasing 
speeds  and  decreasing  costs,  and  a  speed  bottleneck  at  the  addition  function. 

B .  Fourier  Transform  By  Optical  Matrix-Vector  Multiplication 

A  discrete  fourier  transform  maps  a  vector  representing  a  discrete  set 
of  time  samples,  for  example,  into  a  vector  whose  components  are  the  frequency  bins. 
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This  is  a  complex  matrix-vector  multiplication.  This  was  demonstrated  as 
shown  in  Figure  3  by  Goodman,  Diaz  and  Woody  in  1978  using  a  series  of  LED's 
representing  the  input  sample  vector,  a  film  mask  and  a  set  of  detectors  as 
the  output  frequency  vector.  The  result  is  a  very  fast  fourier  transform 
involving  N2  multiplications  and  N2  additions,  where  N  is  the  number  of 
LED's  and  detectors. 

Analysis  of  this  interesting  device  shows  that  the  film  mask  is 
complicated,  but  easily  replicated  and  that  good  use  is  made  of  the  multi¬ 
plexing  aspect  of  optics  both  in  the  input  and  output  of  the  device.  This 

2 

is  possible  here  because  a  matrix-vector  multiplication  has  N  multiplication 
and  N2  additions  with  only  N  inputs  and  N  outputs.  Other  film  masks  can 
also  be  used  providing  different  linear  transformations. 

This  optical  device  also  suffers  the  same  accuracy  problems  as  the  LED 
neighborhood  operator  such  as  linearization  of  the  LED's  and  inbalance  among 
the  many  optical  paths. 

To  assist  in  analyzing  the  optical  matrix-vector  multiplier  we  again 
present  an  analog  electronic  version  as  shown  in  Figure  4.  The  side  view 
shows  one  column  of  the  resistors  which  make  up  the  resistor  array  which 
replaces  the  film  mask.  The  input  vector  (voltages)  is  impressed  on  the 
rows  of  resistors  and  the  output  currents  are  collected  along  the  columns. 
Addition  is  again  accomplished  by  tying  together  wires  to  collect  the  output 
currents.  Multiplexing  occurs  both  at  the  input  and  output  by  tying  together 
the  rows  and  columns,  respectively,  on  opposite  sides  of  the  resistor  array. 


f 

v 
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The  analog  electronic  version  of  the  matrix-vector  multiplier  utilizes 
multiplexing  at  the  input  and  output,  demonstrates  fast  multiplication  and 
addition,  has  good  accuracy  and  can  probably  be  built  as  an  integrated 
circuit.  The  electronic  version  is  not  programmable  as  is  the  optical  version, 
though  the  mask  replacement  may  be  as  complicated  as  replacing  the  entire 
integrated  electronic  chip.  The  optical  version  may  have  the  advantage  that 
film  can  provide  a  medium  for  a  matrix  with  over  1000  x  1000  points,  while 
the  integrated  electronic  version  would  require  development.  A  matrix  built 
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with  discrete  resistors  would  seem  to  be  unreasonable  with  one  million 
resistors,  solder  joints,  etc. 


Analyzing  the  optical  matrix-vector  multiplier  by  inventing  an  analog 
electronic  version  has  brought  out  some  of  the  advantages  and  disadvantages 
of  using  optics.  The  multiplexing  and  addition  aspects  of  optics  are  notable 
but  are  not  limited  only  to  optics,  in  our  cases.  Especially  desirable  is 
the  fact  that  replication  of  the  film  matrix  for  the  fourier  transform  array 
would  be  straightforward,  as  it  also  would  be  for  other  useful  linear 
transformations. 

Implementing  a  discrete  fourier  transform  by  digital  electronics  involves 
many  multiplications  and  additions.  Specialized  processors  are  now  available 
which  use  several  digital  multipliers  and  are  very  efficient  with  data  flow. 
However  these  would  still  be  much  slower  than  either  the  optical  or  electronic 
matrix  systems  that  we  have  been  analyzing. 

C.  2-D  Neighborhood  Operator  Using  A  Single  Channel  Acousto-Optic  Cell 

Demonstrates  Hear  Ultimate  Data  Flow  Efficiency 

This  efficient  image  processing  device  can  accomplish  two  dimensional 
convolution  with  a  large  filter  function  (for  example  9x9)  using  a  single 
channel  acoustooptic  cell.  Figure  5  shows  the  device  which  utilizes  a  film 
filter  function  mask  and  an  AO  cell  to  introduce  the  image  information.  The 
resulting  optical  signal  is  collected  by  a  lens  and  focused  on  the  output 
detector. 

A  digital  electronic  implementation  involves  multiplying  81  pixel  levels 
(9x9  filter)  by  the  filter  weights,  and  adding  them  to  give  each  output  data 
point.  In  an  inefficient  system  this  involves  recalling  the  image  data  81  times 
In  a  more  efficient  pipelined  implementation,  columns  of  data  would  be  provided 
in  steps,  while  shifting  the  columns  inside  the  processor.  This  technique 
requires  9  input  pixels  per  output  data  point,  instead  of  81,  since  one 
dimensional  data  flow  is  being  used.  If  2-D  data  flow  were  possible,  one 
input  pixel  would  yield,  on  the  average,  one  output  data  point,  which  is  the 
ult  lmate . 


The  AO  cell  2-D  convolver  can  approach  within  10%  of  the  ultimate  data 
flow  efficiency  of  one  output  data  point  per  input  pixel.  This  is 
accomplished  by  a  pseudo-two-dimensional  data  flow  in  a  one  dimensional 
acoustooptic  cell.  The  image  enters  the  AO  cell  as  a  series  of  columns 
which  are  each  90  pixels  long  rather  than  the  9  pixels  required  by  the 
desired  9x9  convolution  filter.  Nine  such  extra  long  columns  which  sit 
side  by  side  in  the  image,  fit  in  tandem  in  the  AO  cell.  Of  the  810  pixels 
represented  in  the  AO  cell  only  81  pixels  will  be  used  at  any  one  moment. 

A  mask  adjoining  the  AO  cell  simultaneously  selects  the  desired  9  pixels 
from  each  of  the  nine  long  columns  and  multiplies  them  by  the  film  mask 
filter  function.  A  pulsed  laser  source  illuminates  the  AO  cell,  is  multiplied 
by  the  filter  function  and  is  collected  on  the  detector,  giving  a  single 
neighborhood  calculation.  Each  step  of  the  column  data  through  the  AO  cell 
gives  a  new  output  data  point  corresponding  to  the  filter  mask  stepping  down 
the  long  columns  of  the  image.  Ninety  steps  in  the  AO  cell  give  81  output 
points.  After  the  ninety  steps  in  the  AO  cell,  the  next  column  to  the  right 
in  the  image  is  inserted,  step-by-step,  into  the  AO  cell.  With  the  film  mask 
adjoining  the  AO  cell  unmoved,  all  the  columns  in  the  AO  cell  are  now  facing 
a  new  sec  of  filter  weights,  which  corresponds  to  the  filter  function  moving 
one  pixel  to  the  right  on  the  original  image.  The  result  is  efficient  data 
flow  in  two  dimensions,  first  by  inserting  one  whole  new  column  at  a  time, 
and  second  by  making  the  columns  much  longer  than  required  by  the  filter 
height.  This  implementation  of  a  large  two  dimensional  convolution  filter 
requires  each  data  point  from  the  image  only  1.1  times,  on  the  average,  which 
is  within  10%  of  the  ultimate  data  flow  efficiency. 


M| 


Analysis 


The  single  channel  AO  cell  neighborhood  operator  utilizes  the  la^-ge  time- 
bandwidth  product  of  the  AO  cell  for  data  storage  and  shifting.  Only  a  fraction 
■  of  the  AO  cell  is  ever  used  to  modulate  light,  which  is  inefficient.  The  data 

storage  and  shifting  yields  the  near  ultimate  data  flow  efficiency,  as  is 
recognized  by  proposing  a  similar  digital  electronic  convolver.  In  the  digital 
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version  extra  long  daca  delay  lines  are  used  feeding  81  multipliers  and  adders 
requiring  the  image  pixels  from  the  main  memory  only  1.1  times  per  output  data 
point.  The  digital  system  is  complex  with  81  digital  multipliers  and  adders. 
The  fact  that  addition  is  required  at  the  end  makes  the  digital  system  more 
complicated,  and  the  optical  system  simpler,  requiring  only  a  lens  and  a 
detector.  The  fact  that  the  data  required  for  multiplication  is  in  the  same 
order  as  it  is  found  in  the  image  makes  little  difference  to  the  digital 
system,  but  vastly  simplifies  the  AO  cell  multiplier  (optical  modulator) . 

Accuracy  decreases  as  the  number  of  pixels  stored  in  a  given  AO  cell 
increases,  and  this  trades  off  against  data  flow  efficiency  and  filter  size. 
This  AO  cell  convolver  is  very  wasteful  of  the  AO  cell  accuracy,  as  mentioned 
before,  as  only  about  10%  of  the  cell  is  being  used  for  modulating  light. 
However  the  convenience  of  data  storage,  data  shifting,  and  light  modulation 
is  quite  notable,  and  well  known. 

Summarized  below  are  the  important  characteristics  of  our  AO  cell 
convolver . 

1.  Near  ultimate  data  flow  efficiency  (9  output  data  points  per  10  input 
pixels) . 

1.  Large  filters  handled  (9x9). 

3.  Replaceable  filter  function  (film). 

1 .  Minimum  number  of  sources  and  detectors. 

5.  Limited  accuracy  (20-30  dB  estimated). 

6.  Requires  unusual  data  sequence  from  image. 

7.  Practical  now  with  off-the-shelf  parts. 

8.  Ten  times  faster  than  desired  video  frame  rate. 

9.  Extraordinary  optical  parallelism  in  an  extremely  simple  device. 


RESEARCH  ON  OPTICAL  COMPUTING  ALGORITHMS  AND  ARCHITECTURES 


William  T 


.  Rhodes,  Project  Director  and  Principal  Investigator 
Thomas  K.  Gaylord,  Principal  Investigator 


GEORGIA  INSTITUTE  OF  TECHNOLOGY 
SCHOOL  OF  ELECTRICAL  ENGINEERING 


Georgia  Institute  of  Technology 
School  of  Electrical  Engineering 
Atlanta,  Georgia  30332 


William  T.  Rhodes,  Project  Director  and  Principal  Investigator 
Thomas  K.  Gaylord,  Principal  Investigator 


Research  on  Optical  Computing  Algorithms  and  Architectures 


Abstract 

Research  on  new  architectural  and  algorithmic  approaches  to  optical 
computing  has  been  conducted  in  the  areas  of  1)  optical  degrees  of  freedom  and 
devices  for  controlling  them,  2)  ultra-short  optical  pulses  and  nonlinear 
optics,  3)  number  representations,  4)  content-addressable-memory  processors, 
and  5)  integrated  optical  Givens  rotation  devices.  Results  and 


Technical  Summary 


1.  Objectives 

The  principal  objective  of  this  research  program  is  the  conceptual 
development  of  new  architectural  and  algorithmic  approaches  to  ultra-highspeed 
computing  using  optical  and  opto- electronic  techniques. 


2.  Description  of  Work  Performed  and  Results 

2. 1  Optical  Degrees  of  Freedom  and  Devices  for  Con  trolling  Them 

A  review  was  conducted  of  the  properties  of  light  that  can  be  controlled 
in  an  optical  computer  (e.g.,  polarization,  propagation  direction,  wavelength, 
amplitude,  phase,  intensity),  means  for  controlling  them,  and  the  advantages 
and  disadvantages  of  different  methods,  including  speed  of  operation  and  energy 
consumption.  This  was  done  to  provide  the  basis  for  a  study  of  optical 
computer  architectures  unprejudiced  by  notions  of  what  basic  light  control 
operations  should  be  employed  and  to  provide  to  as  great  an  extent  as  possible 
for  flexibility  in  conceptual  architectural  design. 


2.2  Ultra-Short  Optical  Pulses  and  Nonlinear  Optics  [1] 

A  preliminary  study  has  been  conducted  of  ways  in  which  nonlinear  optical 
phenomena  can  be  used  with  ultrashort  optical  pulses  to  enhance  the 
capabilities  of  optical  computers.  Ultrashort  pulses  are  of  interest  because 
of  their  potential  for  exploiting  the  full  available  bandwidth  of  the  optical 
source.  It  has  been  determined  that  nonlinear  optical  interactions  can  be  used 
in  various  ways  to  allow  for  the  cascade  of  highspeed  content-addressable- 
memory-based  optical  computing  systems  and  to  compensate  for  loss  and 
aberrations.  Nonlinear  optical  interactions  that  exhibit  both  high  speed  and 
high  efficiency  (both  of  which  are  necessary  for  optical  computer  systems)  are 
achievable  only  with  optical  pulses  of  high  peak  power.  Mode-locked  lasers 
and  optical  pulse  compression  systems  produce  such  pulses  every  nanosecond  or 
so.  Unfortunately,  schemes  proposed  thus  far  for  exploiting  the  full  temporal 
frequency  bandwidth  of  the  light  source  [2]  result  in  reduced  peak  optical 
power  and,  hence,  reduced  efficiency  in  the  nonlinear  interactions.  Attention 
is  now  being  given  to  methods  for  avoiding  this  problem. 


2.3  Number  Representation  [3] 

Preliminary  results  have  been  obtained  in  the  investigation  of  number 
representations  for  optical  computing  systems.  Binary  coding,  multilevel 
coding,  and  residue  number  systems  have  been  analyzed  in  terms  of  the  primitive 
operations  of  addition  and  multiplication.  Examples  of  fixed-radix  and  residue 
number  representations  have  been  calculated  with  and  without  multilevel  coding. 
A  detailed  comparison  has  been  made  for  the  case  of  16-bit  full  precision 
addition  and  multiplication.  This  example  has  indicated  a  clear  advantage  of 
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using  multilevel  coding. 

2.4  Content-  Addressable  Memory  Processors  [4] 

Preliminary  results  have  been  obtained  showing  the  use  of  optical  content- 
addressable  memory  processors  in  non-primitive  operations  such  as  discrete 
matched  filtering  (cross  correlation).  The  design  of  an  optical  holographic 
truth-table  look-up  system  that  processes  multilevel  coded  numbers  has  been 
been  developed. 

2.5  Integrated  Optical  Givens  Rotation  Device  [5] 

The  Givens  rotation  operation  plays  a  central  role  in  matrix  formulations 
of  linear  algebraic  signal  processing.  A  design  concept  for  an  integrated 
optical  device  that  implements  this  operation  has  been  developed.  The  device 
uses  electronically-controlled  thick  grating  diffraction  to  control  optical 
wave  amplitudes  in  accord  with  the  desired  rotation  operation.  It  has  been 
shown  that  existing  electro-optic  phase  shifting  and  grating  diffraction 
devices  can  be  combined  to  produce  an  extremely  fast  Givens  rotation  device. 
Operations  that  can  be  performed  by  such  a  device  include  matrix 
triangularization,  matrix  inversion,  solution  of  least  squares  problems, 
singular  value  decomposition,  and  the  calculation  of  eigenvalues  and 
eigenvectors. 

3.  Conclusions  and  Recommendations 

It  is  premature  at  this  stage  to  draw  many  conclusions.  However,  in  terms 
of  program  direction  we  think  that  the  content-addressable-memory  work  is 
particularly  important,  for  it  shows  promise  for  optical  computer  architectures 
capable  of  exploiting  both  the  spatial  and  the  temporal  potential  of  optics. 
Particular  attention  should  be  given  to  multiple-input-multiple-output  systems 
because  of  their  significance  in  parallel  processing  generally. 

The  Givens  rotation  device  is  important  of  itself  because  of  its  possible 
applications.  However,  it  is  also  significant  because  of  the  way  it  exploits 
natural  physical  phenomena  for  performing  operations  not  easily  performed  on  a 
binary-logic-based  electronic  computer.  The  basic  approach  discussed  in  an 
attachment  needs  to  be  studied  further  in  terms  of  accuracy  and  speed 
achievable,  and  related  architectures  and  algorithms  should  be  investigated. 

Nonlinear  optics  used  in  conjunction  with  ultrashort  optical  pulses  can  in 
principle  solve  many  of  the  problems  associated  with  wideband  operation  of 
cascadabie  logic-based  optical  computer  subsystems.  There  is,  however,  a 
conflict  between  the  need  for  high  peak-power  optical  pulses  (for  high- 
efficiency  nonlinear  interactions)  and  the  temporal  modulation  of  the  light 
waves  necessary  for  exploiting  the  full  optical  bandwidth.  This  conflict  must 
be  studied  further  and  somehow  resolved.  Further,  the  energy  and  efficiency 
characteristics  of  nonlinear  optical  devices  under  development  should  be 
considered  in  connection  with  specific  (e.g.,  strawmen)  systems. 
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Truth-table  look-up  processing:  number  representation,  multilevel 
coding,  and  logical  minimization 
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Abstract.  The  need  for  ultra-high-speed  computing  for  a  variety  of  modern 
processing  problems  has  generated  new  interest  in  using  truth-table  look-up 
techniques.  Further,  due  to  the  frequently  parallel  nature  of  thesa  processing 
problems,  optical  systems  appear  to  be  promising  for  these  applications.  The 
basic  principles  of  truth-table  look-up  processing  are  reviewed  in  this  paper. 
The  issues  of  number  representation,  multilevel  coding,  and  logical  minimiza¬ 
tion  are  discussed.  Example  fixed-radix  and  residue  number  representations 
ore  given  with  and  without  multilevel  coding  Logical  reduction  techniques  are 
discussed  with  examples.  A  comparison  of  the  number  of  truth-table  entries 
needed  for  1 6-bit  full-precision  addition  and  multiplication  is  given,  illustrating 
the  advantage  of  the  multilevel  coded  residue  number  representation. 

Sub/ect  farms:  digits!  optics!  computing:  truth- fabls  het-up  processing;  optical  dots 
processing  number  ropresentetion. 
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1.  introduction 

1.1.  The  need  for  ultra-high-speed  computing 

The  number  of  areas  in  need  of  computing  po«r  weU  beyond 

that  currently  available  is  large  and  increasing.  High  through- 
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12.  Truth-table  look-up  processing  as  a  possible  solution 
Many  functions,  transformations,  and  operations  may  be 
represented  by  a  binary  truth  table  in  which  the  outputs  are 
given  for  all  possible  input  combinations.  Direct  implementa¬ 
tion  of  processors  from  a  truth-table  representation  has  not 
been  common  in  the  history  of  data  and  signal  processing.  This 
is  largely  due  to  the  numerous  efficient  algorithms  that  can  be 
programmed  on  general-purpose  Von  Neumann  type  compu¬ 
ters.  However,  the  types  of  problems  listed  above  are  largely 
beyond  the  capabilities  of  present-day  computing  systems. 

These  problem  areas  have  emphasized  the  growing  need  for 
parallel  application  of  the  same  algorithm  to  targe  arrays  of 
data.  This,  in  turn,  has  generated  renewed  interest  in  the  direct 
implementation  of  truth-table-based  processors. 

Many  of  these  processing  problems  are  highly  complex  and 
computationally  intensive.  However,  the  solutions  can  fre¬ 
quently  be  expressed  in  terms  of  matrix-based  algorithms.7  In 
these,  a  single  operation  is  repeated  man>  *imes  over  many 
elements.  This  highly  regular  nature  lends  itself  naturally  to 
parallel  processing  and  to  truth-table  look-up  techniques.  The 
pronounced  structure  of  the  algorithms  has  not  been  efficiently 
utlized  in  past  data  processing  systems. 

There  are  three  general  architectures  for  truth-table  im¬ 
plementation.  These  involve  using  (1)  location-addressable 
memory,  (2)  content-addressable  memory,  and  (3)  hardware 
logic  gates.  These  basic  architectures  are  discussed  in  Sec.  2. 

Gate  arrays  and  programmable  array  logic  (PAL)  devices  that 
are  in  widespread  use  today  are  electronicjmplementations  of  - 
truth  tables.  In  another  example,  off-line  ^priori  calculations  ^  u 
are  used  to  prestore  in  memory  the  controllers  for  given  speed 
ranges  for  a  fighter  aircraft.  This  is  necessary  since  the  required 
calculations  cannot  be  performed  in  real  time  and  thus  are 
obtained  by  look-up.  Papachristou*  has  presented  an  encoding  i  / 
scheme  for  a  direct  truth-table  implementation  of  discrete  and 
residue-based  functions  (see  Sec.  3.2)  that  employs  PAL  de¬ 
vices.  T ruth-table  look-up  has  been  used  for  changing  the  func¬ 
tion  in  optical  cellular  logic  to  implement  two-dimensional 
logical  neighborhood  functions  for  applications  such  as  digital 
image  processing.*  Look-up  methods  have  been  used  to  find  the 
correct  mappings  required  to  implement  a  residue  matrix- 
vector  multiplier.10  Discrete  matched  filtering  and  other  func¬ 
tions  can  be  implemented  by  truth-table  look-up  techniques.11 
Ishihara12  has  described  the  use  of  truth-table  look-up  in  the 
design  of  optical  processing  systems  by  the  joint  university- 
govemmental-industrial  Optical  Computer  Group  of  the  Japa¬ 
nese  Society  of  Applied  Physics.  Potentially,  many  complex 
problems  can  be  treated  with  look-up  methods. 
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1J.  Issues  associated  with  troth- table  look-up  processing 
The  viability  of  truth-table  look-up  processing  for  a  particular 
application  depends  on  a  number  of  critical  issues.  These 
include  (1)  architectural  implementation,  (2)  number  represen¬ 
tation,  (3)  number  encoding,  and  (4)  logical  reduction  and/  or 
minimization.  In  an  electronic  or  optical  hardware  logic  gate 
implementation,  the  resulting  number  of  gates  and  the  number 
of  interconnections  are  determined  by  the  truth-table  represen¬ 
tation  used.  Perhaps  more  important,  the  needed  routes  of  the 
interconnections  are  prescribed  by  the  final  form  of  the  truth 
uble  used.  In  a  bulk  optical  configuration  using  a  content- 
addressable  memory,  the  truth  table  used  specifies  the  amount 
of  storage  needed  (e.g.,  number  of  holographically  recorded 
reference  patterns).13  For  this  case,  however,  the  number  and 
form  of  the  interconnections  specified  are  of  no  particular 
significance  since  the  interconnections  are  made  optically  in 
three-dimensional  space  and  their  routing  is  automatically 
taken  into  account  in  the  original  design  of  the  system.  This  is  in 
dramatic  contrast  to  very-large-scale  integration  (VLSI)  in 
integrated  circuits,  in  which  the  form  of  the  interconnections  is 
typically  the  limiting  factor  in  the  design  of  complex  systems. 

2.  TRUTH-TABLE  LOOK-UP  PROCESSING 
ARCHITECTURE 

2.1.  Location-addressable  memories 
The  most  straightforward  implementation  of  a  truth  uble  may 
be  achieved  by  the  storage  of  the  entire  truth  uble  in  a  direct,  or 
location-addressable,  memory  (LAM)  such  as  an  electronic 
read-only  memory  (ROM).  These  systems  require  a  memory 
size  (in  bits)  of 

S=2Pq,  (1) 

where  p  is  the  number  of  input  bits  and  q  is  the  number  of 
output  bits.  In  these  processors,  the  inputs  determine  the 
address  of  the  answer. 

22.  Content-addressable  memories 
Less  storage  is  generally  required  when  a  truth  uble  is  imple¬ 
mented  using  a  content-addressable  memory  (CAM).  Such 
memories  may  utilize  elejronic,  magnetic,  optical,  or  other 
technologies.  The  unity-result  truth  ubles  for  each  output  bit 
are  stored  in  the  CAM .  A  unity-result  or  a  null-result  truth  uble 
may  be  constructed  from  those  combinations  of  inputs  that 
cause  a  particular  output  bit  to  be  a  “one"  or  a  “zero,"  respec¬ 
tively.  The  unity-result  truth  uble  represents  the  canonical 
sum-of-products  expression  for  the  logical  function  corre¬ 
sponding  to  each  oqtput  bit. 
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In  a  content-addressable  memory,  inputs  are  compared  with 
the  stored  tables,  and  detected  matches  cause  the  appropriate 
output  bits  to  be  a"one"(if  a  unity-result  truth  table  is  stored)  or 
a  “zero"  (if  a  null-result  truth  table  is  stored).  The  stored  input 
words  (or  “reference  patterns"  in  pattern  recognition  terminol¬ 
ogy)  are  the  function  tninterms  in  the  sum-of-products  expres¬ 
sion  (unity-result  truth  table)  or  the  function  maxterms  in  the 
product-of-sums  expression  (null-result  truth  table).  The  num¬ 
ber  of  function  minterms  for  each  output  bit  for  addition  and 
multiplication  using  the  residue  number  system  has  been  com¬ 
piled.14  This  number  is  always  less  than  or  equal  to  the  number 
of  function  maxterms  due  to  the  inherent  structure  associated 
with  the  operations  of  addition  and  multiplication.  In  the  opti¬ 
cal  holographic  implementation  of  content-addressable  mem¬ 
ory,  the  number  of  function  minterms  represents  the  number  of 
holograms  that  need  to  be  stored  in  the  system.10  Using  thick 
holographic  recording  media,  such  as  photorefractive  lithium 
niobate,  holograms  may  be  multiplexed  together  in  a  common 
volume13  with  the  number  of  possible  stored  holograms  being 
on  the  order  of  a  thousand.16 

2J.  Hardware  logic  gates 

A  truth  table  may  also  be  implemented  through  the  direct  use  of 
Boolean  logic  gates.  Each  binary  output  variable,  when  repre¬ 
sented  as  a  sum  of  products  (or  product  of  sums)  of  binary  input 
variables,  may  be  implemented  with  three  levels  of  logic  in  the 
form  of  a  programmable  array  logic  device.  For  a  sum-of- 
products  form,  the  sequence  of  logic  gates  is  NOT,  AND,  OR. 
For  a  product-of-sums  form,  it  is  NOT,  OR,  AND.  The  number 
of  function  minterms  represents  the  number  of  AND  gates  (in 
sum-of-products  implementation)  that  must  be  formed  to  real¬ 
ize  each  output  bit. 

3.  NUMBER  REPRESENTATION 
3.1.  Fixed-radix  system 

In  many  ways,  the  manner  in  which  numbers  are  represented 
places  subtle  and  fundamental  limitations  on  what  types  of 
calculations  can  be  efficiently  performed  on  them.  One  need 
only  try  to  do  calculations  with  Roman  numerals  to  realize  the 
impact  and  importance  of  the  positional  (Arabic)  number  sys¬ 
tem.  Innovations  such  as  the  concept  of  zero  as  a  position¬ 
holdingdigit,  negative  numbers,  and  the  representation  of  frac¬ 
tions  in  a  positional  number  system  required  centuries  of  evolu¬ 
tionary  development.17  It  is  only  a  sense  of  provincialism  that 
causes  the  decimal  system  to  be  viewed  as  a  uniquely  appro¬ 
priate  number  system.  Historically,  satisfactory  progress  in 
many  areas  of  mathematics  and  engineering  has  been  limited  by 
number  representation.1*  Number  systems  and  number  repre¬ 
sentation  are  again  becoming  the  subject  of  increasing  study  for 
achieving  faster  and  more  efficient  computation. 
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In  a  fixed-radix  number  system,  any  number  Nb  in  a  radix 
(base)  b  may  be  represented  by  an . a_m,  such  that 


Nb  =  X 


(2) 


i  — -m  ^ 


where  a,  is  any  of  the  b  digits  allowed.  The  integers  n  and  m 
control  the  size  and  precision  of  the  number.  The  bas^,for  the 
binary,  octal,  decimal,  and  hexadecimal  number  systems  are  2, 
8,  10,  and  16,  respectively.  The  ranges  of  digits  are  0  and  1,0 
through  7,  0  through  9,  and  0  through  9  and f  A  through  F, 
respectively.  For  example,  1 10I2  =  15,  =  13:o  =  Dl6. 

A  property  of  all  fixed-radix  number  systems  is  the  inter¬ 
dependence  of  digit  results  in  numerical  operations.  In  addition 
and  multiplication  this  is  manifested  as  a  carry  digit  propagat¬ 
ing  from  lower  to  higher  significant  digits.  This  requires  that  the 
most  significant  bit  of  a  result  cannot  be  known  until  calcula¬ 
tion  of  all  lesser  significant  bits  has  been  completed.  Thus,  any 
propagation  represents  a  fundamental  limitation  for  high-speed 
digital  electronic  data  processing  systems. 

In  truth-table  look-up  processing,  all  digits  of  the  answer  can 
be  calculated  simultaneously.  However,  carry  propagation 
produces  a  very  undesirable  effect  in  these  processors.  Because 
the  output  digits  depend  on  all  lesser  significant  input  digits,  the 
truth  tables  can  become  enormous.  As  the  number  of  input 
digits  increases,  the  size  of  the  resulting  truth  table  increases 
exponentially.  Clearly,  a  number  system  without  interdigit 
dependence  would  be  highly  desirable  to  avoid  these  unman¬ 
ageably  large  truth  tables.  The  residue  number  system  described 
below  has  no  such  interdigit  dependence. 


3J.  Residue  number  system 

U nlike  the  commonly  used  decimal  and  binary  number  systems, 
the  residue  number  system  (RNS)  is  an  unweighted  system.  The 
base  of  a  residue  system  is  chosen  as  n  relatively  prime  (contain¬ 
ing  no  common  factors)  numbers  m,,  m;,  ....  m,,  called 
moduli.  Any  integer  X  is  then  represented  as  an  n-tuple  (x,,  x2, 
....  xA  where  x  =  Xlf_  (meaning  X  mod  nt).  This  representa- 
tion  is  unique  if  the  range  of  X  is  less  than  or  equal  to  M,  where 


and  represents  the  dynamic  range.  Negative  numbers  can  be 
included  by  an  arbitrary  partitioning  of  the  range  of  the  number 
system. 
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The  important  feature  of  RNSs  is  that  the  fixed-point  arith¬ 
metic  operations  can  be  performed  on  each  digit  individually. 

That  is.  if  X  =  (x„  x2 . x,,)  and  Y  =  (y,.  y2 . xj  are  two 

numbers  of  the  same  system,  then  Z  =  X  •  Y  =  (z,f  z2, ....  z,,), 
where  z,  =  (x,  •  y)  m  for  i  =  1,  2, ....  n,  and  •  represents  the  / 
addition,  &btraction,  or  multiplication  operation.  Division 
may  be  performed  but  it  is  difficult,1*1*  except  for  the  remainder 
zero  case. 

As  an  illustrative  example,  consider  a  set  of  four  moduli 
{3.4,5, 7}.  In  this  system,  the  decimal  numbers  X  =  23  and  Y  =  14 
are  represented  as  X  =  (2,3, 3,2)  and  Y  =  (2,2,4, 0).  The  results  of 
performing  addition,  subtraction,  and  multiplication  on  these 
numbers  are  X  +  Y  =  (1.U.2).  X  -  Y  =  (0,1 ,4,2),  and  X  X  Y  = 
(1, 2,2,0),  which  are  the  residue  representations  of  the  correct 
answers,  i.e.,  37, 9,  and  322,  respectively. 

In  residue  arithmetic,  there  are  a  number  of  basic  operations 
that  are  difficult  to  perform.  These  are  division,  scaling,  sign 
detection,  overflow  detection,  and  relative-magnitude  determi¬ 
nation.  In  spite  of  these  difficulties,  the  fact  that  the  calculations 
associated  with  different  moduli  are  independent  of  each  other 
makes  RNSs  suitable  for  parallel  processing.  An  especially 
significant  increase  in  the  number  of  operations  per  second  is 
achieved  when  the  calculations  are  composed  of  residue  addi¬ 
tion  and  multiplication(e.g.,  in  matrix-matrix  multiplication  or 
discrete  Fourier  transformation). 

The  cyclic  nature  of  residue  arithmetic  makes  it  particularly 
suitable  for  optical  implementations.  Using  the  cyclic  property 
of  a  phase  of  the  polarization  of  light,  a  number  of  numerical 
optical  residue  processors  have  been  developed.21'1*  Hughes 
Research  Laboratories  has  constructed  an  electronic  residue 
arithmetic  digital  image  understanding  system  (RADIUS)  that 
performs  5X5  pixel  generalized  convolution  operations  on  8-bit 
pixels.27  The  moduli  used  are  3 1 , 29, 23,  and  19,  the  four  largest 
primes  less  than  2}  =  32.  Additions  and  multiplications  are 
performed  at  high  speed  by  truth-table  look-up  of  the  residues 
for  each  radix  from  a  random-access  memory  (RAM).  Binary- 
to-residue  and  residue-to-binary  conversion  are  also  accom¬ 
plished  by  truth-table  look-up. 

Moduli  selection  is  an  important  issue  in  the  design  of  any 
system  based  on  residue  arithmetic.  Many  system  parameters 
are  affected  by  the  moduli  set,  and  conflicting  requirements  may 
make  the  selection  difficult.  For  example,  to  reduce  the  execu¬ 
tion  time  for  all  operations  that  involve  a  mixed-radix  conver¬ 
sion,  it  is  desirable  to  have  as  few  moduli  as  possible;  hence, 
large  moduli  are  preferred.  On  the  other  hand,  the  hardware  of 
most  systems  increases  rapidly  with  the  size  of  the  moduli; 
therefore,  using  a  large  number  of  small  moduli  has  the  advan¬ 
tage  of  decreasing  the  complexity  of  the  system. 

There  are  other  considerations  that  imply  different  selec¬ 
tions,  ’*  such  as  increasing  the  storage  efficiency,  having  unity 
multiplicative  inverses,  and  having  unity  multipliers  in  the 
Chinese  remainder  theorem.  A  procedure  for  selecting  the 
moduli  that  are  optimum  in  the  sense  of  requiring  the  minimum 
number  of  reference  patterns  is  given  in  Ref.  14. 
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As  mentioned  previously,  the  tack  of  interdigit  dependence 
makes  RNSs  potentially  extremely  useful  in  reducing  the 
required  number  of  reference  patterns  that  need  to  be  stored. 
This  is  a  very  powerful  feature  in  content-addressable  memory 
applications.  For  example,  consider  the  addition  of  two  16- 
bit  numbers.  If  the  usual  binary  system  is  used  to  represent 
the  numbers,  a  total  of  36,507,222,016  reference  patterns, 
each  a  32-bit  word,  are  needed  to  be  stored  in  a  CAM  for 
truth-table  look-up  processing.  However,  using  the  moduli  set 
(4, 5,7 ,9,1 1,13),  the  number  decreases  to  only  694  patterns  of 
4-bit  to  8-bit  words.  As  shown  in  subsequent  sections,  further 
reduction  in  the  number  of  reference  patterns  can  be  obtained 
by  applying  multilevel  coding  and  logical  minimization 
techniques. 

4.  MULTILEVEL  CODING 

4.1.  Encoding 

Multilevel  coding  has  recently  been  used  as  a  technique  for 
further  reducing  the  number  of  truth-table  entries  (reference 
patterns)  that  need  to  be  stored."  Multilevel  coding  is  an  exten¬ 
sion  of  binary  coding  in  which  more  than  two  levels  are  used. 
For  example,  in  three-level  (ternary)  coding,  the  integers  zero  to 
eight  are  represented  as  00, 01, 02,  10,  11,  12,  20,  21,  and  22, 
respectively.  Minimization  of  multilevel  coded  reference  pat¬ 
terns  requires  a  type  of  logic  different  from  the  commonly  used 
binary  logic.  The  appropriate  logic,  known  as  multiple-valued 
logic,  is  an  active  area  of  research  today. 

Although  significant  progress  has  been  made  in  the  theoreti¬ 
cal  as  peas  of  multiple-valued  logic,  there  have  been  only  a  small 
number  of  electronic  implementations  of  this  logic.  The  first 
full-scale  three-value  electronic  computer  was  completed  in 
1958  at  Moscow  State  University  in  the  Soviet  Union.2*  Elec¬ 
tronic  ternary  logic  has  been  used  in  construaing  arithmetic 
units.2*  Multiple-valued  operation  of  integrated  circuits  has 
been  investigated  at  the  Naval  Research  Laboratory.50  Cur¬ 
rently,  for  example,  the  Intel  8087  floating-point  processor  uses 
four  levels  of  current  in  ROMs.  In  optics,  shadow-casting  tech¬ 
niques  have  been  used  to  implement  multiple-valued  logic.51 
The  faa  that  there  have  not  been  more  implementations  is 
partly  due  to  the  difficulties  in  realizing  multilevel  devices  and 
partly  due  to  the  significant  progress  that  has  been  achieved  in 
the  area  of  binary  logic  systems.  However,  as  has  recently  been 
shown"  multilevel  coding  in  some  optical  systems  can  be 
implemented  as  easily  as  binary  coding. 

4.2.  Example  systems 

Some  examples  of  decimal,  binary,  residue,  binary-coded 
residue,  multilevel  coded  residue,  and  binary-coded  multilevel- 
coded  residue  number  representanonsarepresented  in  Table  I. 
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5.  LOGICAL  MINIMIZATION 
5.1.  Forms  of  logical  reduction  results 
Procedures  for  the  logical  reduction  of  a  truth  table  may  pro* 
duce  results  in  a  variety  of  forms.  For  logical  functions 
expressed  as  a  sum  of  products,  these  include  (1)  the  near- 
minimal  sum,  (2)  the  minimal  sum,  and  (3)  the  absolute  minimal 
sum.  A  near-minimal  sum  is  generally  obtained  by  using  a 
nonexhaustive  reduction  technique.  In  these  methods,  the  sum- 
of-products  logical  expression  is  greatly  reduced  but  not  neces¬ 
sarily  minimized.  These  techniques  can  be  very  fast  computa¬ 
tionally  because  they  do  not  consider  all  reduction  possibilities.12 
A  minimal  sum  is  a  reduced  sum-of-products  expression  that 
has  the  minimum  possible  number  of  terms  in  it.  These  forms 
are  often  called  “sloppy  minimal  sums”  in  the  literature.22  An 
absolute  minimal  sum  is  a  reduced  sum-of-products  expression 
that  has  both  the  minimum  possible  number  of  terms  in  it  and 
the  minimum  number  of  factors  (or  variables)  in  each  term. 
This  form  is  also  called  the  “real  minimal  sum”  and  sometimes 
(confusingly)  the  “minimal  sum.” 

As  an  example,  the  logical  minirf-ization  for  the  simple  case 
of  addition  modulus  4  is  shown  in  Fig.  1 .  The  method  illustrated 
in  this  figure  represents  only  one  particular  approach  to  logical 
minimization.  However,  in  general,  the  steps  in  minimization 
are  ( 1 )  define  the  initial  truth  table  for  the  operation  in  question; 
(2)  find  all  prime  implicants;  (3)  construct  the  table  of  choice; 
and  (4)  obtain  a  minimal  sum.  Each  of  these  steps  will  be 
discussed  in  subsequent  sections.  An  alternative  technique,  the 
use  of  the  Karnaugh  map,24  allows  a  minimal  sum  to  be 
obtained  directly  without  using  the  steps  listed  here.  It  is  a 
graphical  method  that  allows  the  minimal  sum  to  be  visualized 
directly,  but  it  is  impractical  for  functions  of  more  than  about 
five  variables.  The  steps  listed  above,  however,  can  be  pro¬ 
grammed  on  a  computer  and  can  handle  any  number  of  input 
variables. 


5.2.  Finding  prime  implicants 

As  shown  in  Fig.  1 ,  the  function  is  first  specified  and  then  coded. 
In  this  case,  since  the  modulus  is  4,  binary  coding  is  used 
directly.  From  these  results,  the  truth  tables  for  the  most* 
significant  bit  (MSB)  and  Ieaswsignificant  bit  (LSB)  may  be 
constructed  directly.  Then  the  logical  expressions  may  be  writ¬ 
ten.  AU  of  the  prime  implicants22  of  a  logical  function  an  be 
determined  by  the  Quine-McCluskey  method.25*24  This  method 
is  summarized  in  numerous  textbooks^  and  is  illustrated  in 
Fig.  1  for  the  MSB.  The  minterms  are  listed  in  subgroups 
starting  with  those  that  have  a  single  “one”  in  them,  then  those 
with  two  “ones"  in  them,  and  so  on,  until  all  minterms  are  listed 
in  the  first  group.  Then,  all  pairs  of  minterms  that  differ  by  only 
one  factor  are  checked.  These  pairs  are  listed  in  a  second  group 
with  “don't  are”  dashes  at  the  loation  of  the  differing  factor 
for  the  pair.  The  process  of  combining  terms  that  differ  by  only 
one  factor  is  then  continued  until  no  further  combining  is 
possible.  For  the  example  in  Fig.  I,  no  further  combining  is 
possible  in  the  second  group.  All  unchecked  terms  in  the  prime 
impliant  table  constitute  the  list  of  all  prime  impliants. 
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As  the  number  of  variables  increases,  the  Quine-M  cCluskey 
method  of  determining  the  prime  implicants  becomes  inefficient 
in  terms  of  execution  time  and  required  memory.  More  efficient 
methods  include  the  Tison  algorithm33  and  the  tree-structured 
approach  of  Moneale.3*  An  even  more  efficient  modified  tree 
structure  method  has  been  developed  recently  by  Guest.3* 

S3.  Constructing  table  of  choice 

In  the  Quine-McCluskey  method,  after  the  prime  implicants 
have  been  determined,  a  table  of  choice  is  constructed.  This 
consists  of  all  the  prime  implicants  (listed  vertically  in  Fig.  1) 
and  all  the  function  minterms  (listed  horizontally  in  Fig.  1). 
Each  prime  implicant  row  is  marked  in  the  columns  of  the 
minterms  covered  by  that  prime  implicant.  Thus,  it  is  observed 
how  the  entries  in  the  initial  truth  table  are  covered  by  the  prime 
implicants. 

5.4.  Obtaining  minimal  sum 

_ /A  minimal  sum  is  obtained  by  finding  the  minimal  prime  impli- 

i-cant  covering  of  the  table  of  choice.  This  is  referred  to  in  the 
literature  as  the  covering  problem,  the  set-covering  problem,  or 
the  minimum-covering  problem.33  An  absolute  minimal  sum 
may  be  found  using  the  following  steps.  Note  that  this  absolute 
minimal  sum  may  not  be  unique;  there  may  be  other  absolute 
minimal  sums  that  can  be  obtained  by  changing  the  order  in 
which  the  selections  below  are  made. 

Step  one:  Select  essential  rows.  Some  rows  uniquely  cover 
some  of  the  columns.  These  rows  must  be  selected  in  order  to 
cover  those  columns.  The  prime  implicants  associated  with 
these  rows  are  called  essential  prime  implicants.  The  essential 
rows  and  all  columns  that  contain  marks  in  these  rows  should 
be  eliminated  from  the  table  of  choice. 

Step  two:  Eliminate  dominated  rows.  One  row  dominates  a 
second  row  if  the  first  row  has  marks  in  all  columns  in  which  the 
second  row  has  marks.  If  the  dominating  row  has  the  same  or 
fewer  variables  in  its  prime  implicant,  the  prime  implicant 
associated  with  the  dominated  row  should  be  eliminated  from 
the  table  of  choice.  If  a  minimal  sum,  rather  than  an  absolute 
minimal  sum,  is  satisfactory,  then  the  number  of  variables  in  the 
prime  implicants  need  not  be  compared. 

Step  three:  Eliminate  dominating  columns.  Similarly,  one 
column  dominates  a  second  column  if  the  first  column  has 
marks  in  all  rows  in  which  the  second  column  has  marks.  The 
minterms  associated  with  the  dominating  columns  should  be 
eliminated  from  the  table  of  choice. 

Step  four:  Repeat  steps  one  through  three  until  all  columns 
are  eliminated.  The  resulting  sum  of  all  essential  row  prime 
implicants  represents  the  absolute  minimal  sum. 

There  are  cases,  however,  in  which  the  above  procedure  is 
unable  to  eliminate  all  columns.  The  remaining  table,  which 
contains  at  least  two  marks  in  each  column,  is  called  a  cyclic 
table.  In  this  case,  the  tabular  method  using  a  recursive  branch- 
and-bound  algorithm  presented  by  Muroga33  may  be  used  to 
obtain  the  absolute  minimal  sum. 
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Minimization  techniques  in  multiple-valued  logic  are  some¬ 
what  different  from  those  used  in  binary  logic.  In  binary  logic,  if 
two  terms  in  a  sum-of-products  expression  are  the  same  in  all 
bit  positions  except  one,  they  can  be  combined  into  one  term 
that  has  a  “don’t-care"  bit  at  that  location.  For  example,  100 
and  101  can  be  combined  as  I0X,  where  X  represents  a  “don’t- 
care"  bit.  In  multiple-valued  logic,  terms  can  be  combined  in 
several  ways.  For  example,  in  ternary  logic,  the  terms  120, 121, 
and  122  can  be  reduced  to  12X,  where  X  (referred  to  as  a 
“complete-don't-care"  digit)  represents  a  digit  with  possible 
values  of  0,  1,  and  2.  If  one  of  the  above  terms  is  absent,  the 
other  two  can  still  be  combined.  For  example,  the  terms  120  and 
121  can  be  reduced  to  I2X,,,,  where  X0,  (referred  to  as  a  “partial- 
don’t-care"  digit)  represents  a  digit  with  possible  values  of  0  and 
I,  but  not  2. 

As  the  number  of  entries  in  a  truth  table  increases,  the 
minimization  procedure  becomes  too  complex  to  be  handled  by 
hand.  Associated  with  the  present  work,  a  computer  program 
has  been  developed  to  reduce  the  reference  patterns  for  an 
arbitrary  level  coding  and  to  obtain  the  minimum  number  of 
required  patterns.  The  Quine-McCluskey  technique  was  ex¬ 
tended  to  handle  the  multiple-valued  logic  case.  In  the  first  part 
of  the  program,  a  complete  list  of  the  prime  implicants  is 
obtained.  Using  this  set,  a  table  of  choice  is  constructed.  Then,  a 
minimal  sum  set  is  obtained  by  applying  the  reduction  rules  to 
the  table.  The  results  for  residue  addition  and  multiplication  for 
moduli  2  through  32  are  given  in  Ref.  1 1 .  These  results  show 
that  the  number  of  reference  patterns  can  be  decreased  signifi¬ 
cantly  if  the  appropriate  level  of  coding  is  used.  If  the  modulus 
can  be  expressed  as  M  =  pn,  where  p  is  a  prime  number  and  n  is 
a  positive  integer  greater  than  one,  p-level  coding  is  the  best 
choice.  For  example,  binary  coding  is  appropriate  for  moduli 
such  as  4,  8,  16,  and  32,  while  ternary  coding  is  beneficial  for 
moduli  such  as  9  and  27.  This  is  due  to  the  highly  regular 
structures  of  the  truth  tables  that  are  produced  in  these  cases. 
For  a  modulus  that  is  not  expressible  in  the  above  form,  the 
proper  coding  level  can  be  found  among  its  prime  factors.  The 
prime  factor  that  produces  the  largest  contribution  to  the  modu¬ 
lus  is  usually  the  best  choice.  For  example,  binary  coding  is 
appropriate  for  modulus  1  2*  X^5),  while  modulus  61—  2X^3) 

benefits  from  ternary  coding.  f' 

The  optimum  sets  of  moduli  for  minimizing  the  number  of 
reference  patterns  for  performing  16-bit  full-precision  addition 
and  multiplication  are  given  in  Table  11.  Results  for  both 
binary-coded  residue  numbers  and  multilevel  coded  residue 
numbers  are  given  before  and  after  logical  minimization. 
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6.  OPTICAL  IMPLEMENTATION 

An  optical  implementation  of  a  truth-table  look-up  data  pro¬ 
cessing  system  using  a  holographic  content-addressable  memory 
is  described  in  Ref.  1 1 .  The  optipal  system  presented  is  capable 
of  processing  multilevel-coded  numbers.  The  operations  of 
addition,  multiplication,  and  discrete  matched  filtering  (cross- 
correlation)  are  evaluated  in  terms  of  the  number  of  required 
reference  patterns  for  various  word  lengths. 
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7.  DISCUSSION  AND  SUMMARY 

Truth-table  look-up  processing  concepts,  implementations,  and 
applications  have  been  reviewed.  Due  to  current  and  future 
needs  for  ultra-high-speed  computing,  there  has  been  an 
increasing  interest  in  using  truth-table  look-up  techniques. 
Further,  due  to  the  parallel  nature  of  numerous  modern  pro¬ 
cessing  problems,  optical  systems  are  serious  candidates  for 
these  applications. 

The  issues  of  number  representation,  multilevel  coding,  and 
logical  minimization  have  been  discussed.  The  number  of 
entries  (reference  patterns)  in  the  reduced  truth  table  is  of 
central  importance  in  determining  the  viability  of  look-up  tech¬ 
niques  for  a  particular  application.  In  hardware  logic  gate 
implementations  (electronic  or  optical),  the  number  of  gates 
and  the  number  of  interconnections  are  prescribed  by  the  logi¬ 
cally  reduced  form  of  the  truth  uble  used.  In  the  location- 
addressable  memory  and  content-addressable  memory  imple¬ 
mentations,  the  reduced  truth  table  specifies  the  amount  of 
storage  required.  In  these  cases,  the  interconnections  are  of  no 
particular  significance,  in  marked  contrast  to  the  hardware 
logic  gate  case.  In  an  optical  content-addressable  memory,  the 
truth-table  look-up  processing  can  be  performed  in  parallel. 

For  comparison,  the  number  of  truth-table  entries  for  16-bit 
full-precision  addition  and  multiplication  are  given  in  Table  HI. 
Results  are  supplied  for  binary  and  residue  representations. 
Values  are  given  with  and  without  logical  minimization  and 
with  and  without  multilevel  coding.  The  dramatic  reduction 
using  residue  number  systems  is  apparent.  Further  significant 
reductions  are  shown  by  using  logical  minimization  and  multi¬ 
level  coding.  Thus,  number  representation,  multilevel  coding, 
and  logical  minimization  are  all  significant  factors  in  truth-table 
look-up  processing. 

S.  ACKNOWLEDGMENTS 

This  work  was  supported  in  part  by  a  grant  from  the  Strategic 
Defense  Initiative  OfTice,  administered  through  the  Office  of 
Naval  Research,  and  by  a  grant  from  the  Joint  Services  Elec¬ 
tronics  Program. 


-jvjwrf"  w w  v:  rj  w. 


Galley  13 
Disk  1  103-13 


Opt. Eng.  now  includes  titles  of  journal 
and  proceedings  papers  and  book 
chapters  in  the  references.  If  readily 
available/  on  a  separate  page  please 
provide  titles  for  each  reference. 

0T- 103/Gaylord 


Opt.  Eng.  January  1 986  mg 


.  up- 

lted 

lfo 

)W 

rail- 

sle? 


re  authors  needed  for  Refs.  2*4?  Are  page  numbers  needed  for  Refs.  2  £  4? 


9.  REFERENCES - - - - -  - - - - - J 

J.  A.  L  Robinson.  Science  20X4376),  156  (1979). 

2.  Computer  16(6).  (1983). 

J  j.  Pbys.  Today,  special  iaoe  on  Advance*  in  Computers  for  Physics,  37(5),  6 1 
(1964). 

'  4  Vroc  IEEE,  special  isiue  on  Supercomputers— Their  Impact  on  Science 

ir.i  T czhr.^’oz'.  n/!-  (,W4> 

i.  W  M.  Browru  IEEE  Tran*.  Aerospace  Electron.  Sy».  AES-K2),  217 

6.  Vm2.  Brown.  G.  G.  Houser,  and  R.  E.  Jenkins.  IEEE  Trans.  Aerospace 
Electron.  Sys.  AES-9(2),  166  (1973). 

7.  K.  Bromley.  “An  interview  with  Keith  Bromley  on  signal  processing. 

Optical  Engineering  Reports  No.  14,  p.  I.  SPIE  (February  1985). 

g  C.  A.  Papachristou.  IEEE  Trans.  Comput.  C-32(I0),  961  (1983). 

9.  T  Yatagai.  S.  Inaba,  H  Nakano.  and  M.  Suzuki,  “Automatic  flatness 
I  tester  for  very  large  scale  integrated  circuit  wafers, “  Opt.  Eng.  23(4),  40 1 
(1984) 

I  10  S  F  Habiby  and  S.  A.  Collins.  Topical  Meeting  on  Optical  Computing. 

I  _  Techmcal  Digest,  pp.TuD4l-TuD44.OSA.  Washington  (1985).  _ 

■rfw  M.  M.  Mirsalehi  and  T.  K.  Gaylord,  submitted  to  AppL  Opt.  .  A ,  \  / 

l  72  S  Ishihara,  Topical  Meeting  on  Optical  Computing.  Technical  Digest.^.  ,r  ^  '  j —  / 

TuE2!.OSA.  Washington  (1985). 

13.  C.  C.  Guest  and  T.  K.  Gaylord,  AppL  Opt.  19(7),  1201  (1980). 

14  C  C.  Guest,  M.  M.  Mirialchi.  and  T.  K.  Gaylord,  IEEE  Trans.  Comput. 

C-33<!0).  927  (1984).  _  , 

->  15.  J.  E.  Weaver  and  T.  K.  Gaylord.  “Evaluation  experiments  on  holographic 
storage  of  binary  data  in  electro-optic  crystals,"  Opt.  Eng.  20(3).  404  (1981). 

16  D  L  Staebler.  W  J.  Burke.  W.  Phillips,  and  J.  J.  Amodei.  Appl.  Phys. 

Lett.  26(4).  182  (1975).  c  wo. 

17.  J.  R.  Newman.  The  World  of  Mathematics.  pp.  430-520.  Simon  k  Schus¬ 
ter.  New  York  (1956).  _ 

18.  R  M.  Kline.  Digital  Computer  Design,  p.  19,  Prentice-Had.  Englewood 

Cliffs.  NJ  (1977).  . 

19  N  S  Siabo  and  R  I.  Tanaka.  Residue  Arithmetic  and  Its  Applications  to 
Computer  Technology.  McGraw-Hill,  New  York  (1967V 

20  E  Kinoshita,  H  Kosako,  and  Y.  Kojima.  IEEE  Trans.  Comput.  C-22(2), 

134  (1973).  „  , 

21.  A.  Huang,  in  Froc.  Int.  Optical  Computing  Conference,  pp.  14-in,  Itfct, 

New  York  (1975).  ...  ,  _ 

22.  A.  Huang  Y  Tsunoda.  J.  W.  Goodman,  and  S.  Ishihara.  AppL  OpL  18(2), 

|49  ( 1979). 

23  A.  Tai,  I.  Cindrich,  J.  R.  Fienup,  and  C.  C.  Akksoff,  Appl.  Opt.  11(16). 

2812(1979).  ..  „  .  ..... 

24  S.  A.  Collins.  Jr..  “Numerical  optical  data  processor,"  m  effective  Utilisa¬ 
tion  of  Optics  in  Radar  Systems.  B.  W.  Vatz,  ed.,  Proc.  SPIE  128,  313 
(1977). 

23  J.  N.  Polky.  D.  D  Miller,  and  R.  L.  Guimann,  “Optical  residue  arithmetic 
data  processing,*  Boeing  Aerospace  Company,  Report  AFW  AL/  AADO- 
2,  Wnght  Patterson  AFB.  Ohio  (1982). 

26  J  Jackson  and  D  Casasent,  Appl.  Opt.  22(18),  2817(1983). 

27.  S  D  Fouse.  G.  R.  Nudd,  and  A.  D.  Cummings,  in  Froc.  6th  Ins.  Conf.  on 
Fat 'em  Recognition,  pp  262-269,  IEEE,  New  York  (1982). 

28.  D  C.  Rine,  ed..  Computer  Science  and  Multiple- Valued  Logic.  Nortb- 
Holland.  Amsterdam  (1977). 

29.  P.  Sebastian  and  Z.  G.  Vtenesk,  in  Froe.  1972  Symp.  on  Theory  and 
Applications  of  Multiple- Valued  Logic,  SUN  Y,  Buffalo,  New  York(  1972). 

30.  G.  Abraham.  Computer  7(9).  42(1974). 

31.  R.  Arrathoon  and  S  Kozaitu,  “Shadow-casting  for  multiple-valued  asso¬ 
ciative  logic,"  Opt  Eng  25(1),  (1986) 

J1  Z.  Arevalo  and  J.  G.  Bredeson,  IEEE  Trans.  Comput.  C-27(ll),  1028 
33.  S.  Muroga.  Logic  Design  and  Switching  Theory,  p.  163.  John  Wiley  k 

34  ^’KamTugh^UEfiE^rans.  Commun.  k  Electron.  72(5).  593  (1953).  A\Z.£ 

35  W  V  Quine.  Am.  Math.  Monthly  59(10).  521  (1952). 

36  E  M  McCluskey.  Bell  Sys.  Tech.  J.  J5(6),1417  (1956). 

37  T  L.  Booth.  Digital  Networks  and  Computer  Systems.  John  Wiley  a. 

Sons.  New  York  (1971). 

38  E  Morrale,  IEEE  Trans  Electron.  Comput.  EC-16(3).  61 1  (1967V 

39  C  C  Guest.  “Holographic  optical  digiul  parallel  processing"  Ph  D.  the-  -r 

Mi.  Georgia  Institute  of  Technology  ( 1983). 

^  Fig.  1  The  process  of  logic at  minimization  for  reaidue  addition  modulua  4  ia  illustrated,  including  the  atepa  of  defining  the  initial  truth 
ah  prime  impiicenta.  constructing  the  table  of  choice,  end  finding  the  absolute  minimal  sum. 
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TABLE  II. 

r 


Optimum  Sat*  of  Moduli.  Coding  Law  at.  and  Numbon  of 
Reference  Panama  for  Performing  16-Bit  Full-Precision 
Addition  and  Multiplication  Uaing  Binary-Coded  and  Multi¬ 
level-Coded  (Allowing  for  Two-,  Three-,  and  Five- Level  Cod- 
ing)  Residue  Number  Syatama* _ 

>  Addition  ^  Multiplication 

Number  of  Number  of 


1 

•~r 

( 

/ 


Modulus 

Coding  reference 

laval  .  pattern* 

Modulus 

Coding  reference 
level  panama 

t - 

T - 

TT - - 

Binary-coded 

4 

2 

18 

8 

2 

20 

raaidua 

5 

2 

28 

7 

2 

84 

(without 

7 

2 

63 

9 

2 

102 

logical 

B 

2 

117 

11 

2 

170 

reduction  | 

11 

2 

197 

13 

2 

264 

13 

2 

286 

16 

2 

392 

17 

2 

828 

19 

2 

666 

23 

2 

1066 

Binary-codad 

3 

2 

6 

8 

2 

15 

raaidua 

6 

2 

19 

7 

2 

18 

(after  logical 

7 

2 

36 

9 

2 

65 

minimization) 

11 

2 

90 

11 

2 

84 

13 

2 

116 

13 

2 

115 

16 

2 

60 

16 

2 

.4 

17 

2 

2&5 

19 

2 

266 

23 

2 

381 

Multilevel- 

4 

3  Of  5 

12 

6 

8 

18 

coded  residua 

8 

5 

20 

7 

8 

42 

(without 

7 

6 

49 

9 

3  or  8 

84 

logical 

9 

5 

99 

11 

5 

140 

reduction) 

11 

8 

184 

13 

8 

216 

13 

8 

234 

16 

8 

328 

17 

6 

400 

19 

6 

622 

23 

8 

792 

Muttilevet- 

4 

2 

8 

6 

2  or  3 

15 

codad  residua 

6 

3 

18 

7 

2 

18 

(after  logical 

7 

2 

36 

9 

3 

30 

minimization) 

9 

3 

36 

11 

8 

78 

11 

3 

89 

13 

3 

108 

13 

8 

113 

16 

2 

44 

17 

3 

178 

19 

6 

242 

23 

3 

360 

v ' SC  .•  r 


trl.  I, 


’Addition  of  two  1 6- bn  word*  produce*  a  1 6-bit  turn  with  an  output  carry 
bit  (no  input  carry  bit).  Multiplication  of  two  1 6-bit  word*  produce*  a  full 
32-bit  product  (rather  than  the  aimplar  fixed-point  or  floating-point 
repraaantatton*). 
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TABLE  III.  Comparison  of  Number  of  Required  Reference  Patterns  to 

C  Perform  1ft* Bit  Full-Precision  Addition  and  Multiplication 

_ Using  Various  Encoding  Schamaa* _ 


Addition 

Multiplication 

Binary 

(without  logical  reduction) 

3.65X10’® 

6.32X10’® 

Binary 

(aftar  logical  minimization) 

3.28X10® 

1.43X10' 

BinafY-Coded  residua 
(without  logical  reduction) 

894 

3252 

Binary-coded  residue 
(after  logical  minimization) 

327 

1183 

Multilevel-coded  residue 
(without  logical  reduction) 

568 

2540 

Multilevel-coded  residue 
(after  logical  minimization) 

300 

1067 

•Two-,  three-.  and  fiva-laval  coding  have  boon  uaad  in  the  multilaval 
coding  results. 
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Fig.  1  (top  portion) 


Fig.  1  (upper  middle  portion) 
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Fig.  1  (lower  middle  portion) 


Fig.  1  (bottom  portion) 


Tab  I*  l  Number  of  Required  Reference  Panama  for  Raaidua  Addition 
pattern  and  Mutlipltcellon  Uemg  Differ  ant  Levels  of  Codmq 
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MACHINE  VISION 


ABSTRACT 

This  report  develops  and  explains  work  at  SAIC  on  Machine  Vision  under  the 
SDIO  IS&T  OpticalConsortium  program  during  the  period  from  June  to 
December,  1985.  An  architecture  for  a  machine  vision  system  is  developed 
and  explained  in  the  context  of  Bela  Julesz's  findings  for  natural  vision. 
This  architecture  contains  a  sensor  fusion  function  (modeled  after  Julesz's 
Cyclopean  vision),  and  a  pattern  recognition  function.  Optical  and 
electronic  implementations  of  neural  nets  for  stereopsis  and  pattern 
recognition  are  presented. 
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TECHNICAL  SUMMARY 


1.  Objectives 

Natural  vision  in  man  and  animals  greatly  outstrips  present  achievements  in 
machine  vision.  The  objective  of  this  effort  is  to  tap  the  existing  body 
of  knowledge  on  natural  vision,  and  apply  this  knowledge  to  machine  vision. 


2.  Description  of  Work  Performed  and  Results 

The  neural  architecture  behind  stereo  vision  in  man  is  examined  and  applied 
to  the  general  machine  vision  problems  of  sensor  fusion  and  pattern 
recognition.  Optical  processing  is  identified  as  a  candidate  preproces¬ 
sing  technique  in  a  machine  algorithm  for  stereo  vision  due  to  Marr. 

Recent  work  in  neural  networks  is  examined  for  application  to  pattern 
recognition.  The  aspect  and  noise  tolerance  of  neural  network  models,  as 
demonstrated  in  the  work  of  Kohonen  and  Fukushima,  is  identified  as  a 
potential  solution  to  difficulties  wi*h  matched  filtering  approaches  to 
pattern  recognition.  Optical  and  electronic  concepts  to  implement  neural 
networks  (either  at  the  functional  or  architectural  levels)  are  presented. 


3.  Conclusions  and  Recommendations 

Knowledge  of  natural  vision  in  humans,  and  related  sensory  systems  in 
animals  (monkeys,  cats,  bats)  provides  an  extremely  valuable  basis  for 
future  development  of  machine  vision.  The  explotation  of  this  knowledge 
will  require  1)  continued  study  of  natural  vision  to  provide  additional 
insight  and  knowledge,  2)  continued  effort  by  technologists  to  develop 
parallel  optical  and  electronic  processing  frameworks  for  machine  implemen¬ 
tation  of  functions  similar  to  those  found  in  natural  vision,  such  as 
sensor  fusion  and  pattern  recognition. 
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1.0  INTRODUCTION 


i 


This  report  develops  and  explains  work  at  SAIC  under  the  SDIO  I S& T  Optical 
Consortium  program  on  Machine  Vision  during  the  period  from  June  to 
December,  1985. 

We  use  the  term  "machine  vision"  to  encompass  the  generation  and  processing 
of  three-dimensional  or  two-dimensional  scenes  from  any  number  of  diverse 
sensors.  The  sensors  may  detect  radiation  in  any  band  from  x-ray  through 
radio  wavelengths,  and  may  be  configured  in  arbitrary  constellations  in 
space  or  on-board  aircraft.  Combinations  of  active  and  passive  sensors  may 
be  used.  For  example,  data  collected  by  an  active  inverse  synthetic 
aperture  radar  (inverse  SAR)  and  by  a  passive  infrared  (IR)  telescope  might 
be  fused  by  the  machine  vision  system  to  generate  a  3-D  representation  of  a 
target. 


P 


We  have  identified  many  approaches  to  the  fusion  of  data  from  diverse 
sensors.  Three  broad  categories  of  techniques  for  sensor  fusion  and 
processing  can  be  identified: 


1)  man  in  loop 

2)  statistical  decision  theory  methods 

3)  artificial  intelligence  (AI) 

In  the  "man  in  the  loop"  approach,  the  data  from  the  sensor(s)  is  prepared 
in  graphical  form  for  an  expert  to  assess.  If  two  types  of  sensors  are 
involved,  the  imagery  from  each  sensor  might  be  displayed  in  registration 
(overlay).  Since  inverse  SAR  images  exist  in  the  range  and  one  cross  range 
dimensions,  while  conventional  telescope  images  exist  over  the  two  c-oss 
range  dimensions,  these  types  of  images  should  be  combined  as  orthogonal 
projections.  The  human  analyst  brings  to  the  data  his  knowledge  of  likely 
target  characteri sties ,  and  his  understanding  of  image  artifacts,  which 
might  arise  from  glints,  radar  echos/wave  guiding  or  a  poor  target  motion 
solution.  The  obvious  problem  with  man  in  the  loop  approaches  is  the 
difficulty  in  satisfying  the  demanding  timelines  required  for  SDI 
appl ications. 
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The  statistical  decision  theory  approach  to  vision  has  provided  substantial 
progress  in  the  area  of  pattern  recognition.  See  for  example  the  volume  of 
reprints  edited  by  Agrawala,  Machine  Recognition  of  Patterns.  Less 
progress  has  been  made  in  the  area  of  sensor  fusion.  However,  the  Kalman 
filter  technique  provides  a  methodology  for  combining  diverse  information 
to  refine  an  estimate  of  the  state  vector.  For  example,  Wood  applied 
Kalman  filtering  to  the  problem  of  image  reconstruction  from  projections  in 
computed  tomography.  (A  System  Theoretic  Approach  to  Image  Reconstruction, 
Stanford,  Ph.O.  thesis,  1978).  In  the  computed  tomographic  application, 
the  Kalman  filter  provides  a  natural  way  to  "fuse"  hundreds  of  x-ray 
projection  measurements  made  at  different  aspect  angles.  This  is  true 
"tomographic"  vision.  Moreover,  the  Kalman  filter  also  processes  the 
measurement  data  in  an  optimal  way,  taking  into  account  the  known  noise 
statistics.  Unfortunately,  in  the  x-ray  tomography  applications  "optimal 
estimation"  requires  so  much  computation  that  it  is  not  feasible  for 
clinical  use  (Buonocore  et  al).  Surely,  the  methods  of  statistical 
decision  theory  and  optimal  estimation  have  an  important  role  to  play  in 
machine  vision.  However,  if  the  noise  statistics  are  distinctly 
non-gaussian,  or  the  computational  load  of  these  now-classical  techniques 
is  prohibitive,  a  novel  approach  is  required. 

Thus  the  interest  in  artificial  intelligence  (AI)  for  machine  vision.  But 
what  sort  of  AI?  The  current  emphasis  in  AI  is  on  expert  based  or  rule 
based  inference  systems.  Expert-based  AI  systems  can  no  doubt  be  created 
which  draw  upon  the  existing  knowledge  of  sensor  and  target 
characteristics.  But  the  semantic  domain  of  a  rule  base  is  very  far 
removed  from  the  raw  image  data  generated  by  an  inverse  SAR  or  IR  sensor. 
Rule  based  AI  technology  does  not  address  the  basic  problems  of  pattern 
recognition,  image  fusion  (as  in  stereo  vision),  or  general  sensor  fusion. 
Since  these  visual  processes  are  carried  out  at  an  unconscious  level,  we 
are  not  aware  of  what  rules  to  program  in  a  rule-based  system  for  vision. 
Once  the  "visual"  phases  of  the  sensor  fusion  problem  are  solved,  a 
rule-based  system  might  be  applicable  for  subsequent  functions.  However, 
the  fact  that  natural  vision  transpires  at  an  unconscious  level  suggests 
strongly  that  rule-based  inference  machines  are  poorly  matched  to  the 
machine  vision  problem  per  se. 
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We  are  consequently  led  to  an  alternative  approach  to  artificial 
intelligence,  which  draws  upon  what  is  known  about  natural  intelligence. 
Fortunately,  our  understanding  of  natural  vision  has  a  respectable  base  in 
the  work  of  Helmholtz,  Wheatstone,  Julesz,  Hubei  and  Weisel,  Marr,  and  many 
others. 

The  work  of  Julesz  on  stereo  vision  (Foundations  of  Cyclopean  Perception) 
for  example,  provides  valuable  clues  in  the  development  of  an  architecture 
for  a  machine  vision  system  capable  of  stereopsis.  Julesz  showed  through 
nis  studies  with  random-dot  stereograms  that  stereopsis  is  obtained  prior 
to  pattern  recognition.  Julesz  demonstrated  this  by  creating  random-dot 
stereograms  that  provide  no  monocular  clues  to  the  pattern  seen  when  the 
stereogram  is  fused.  For  the  reader  who  is  not  familiar  with  Julesz's  work 
we  have  provided,  in  Figure  1,  an  example  of  Julesz's  random  stereograms. 
To  fuse  the  stereo,  cross  your  eyes  so  that  the  left  and  right  random  dot 
patterns  are  superimposed.  If  you  succeed  in  obtaining  stereopsis,  you 
will  see  a  pattern  in  the  stereo  view  which  is  not  evident  under  monocular 
viewing. 

We  have  generalized  this  result  for  the  case  of  multiple  sensor  fusion  in 
Figure  2,  which  presents  a  general  architecture  for  3-D  scene  generation 
and  pattern  recognition  that  is  consistent  with  Julesz's  findings  for 
stereopsis.  The  machine  vision  program  which  we  have  mapped  out  under  this 
initial  SDIO  IS&T  effort  encompasses  both  the  further  development  of  a 
machine  vision  architecture,  and  the  development  of  the  individual 
functions  identified  in  this  architecture,  e.g.,  machine  sensor  fusion,  and 
pattern  recognition. 

An  indication  of  the  potential  for  machine  sensor  fusion  is  provided  by  the 
work  of  Marr  (Vision)  on  stereopsis.  Our  goal  is  to  find  more  general 
solutions  for  trinocular  vision  as  well  as  binocular  vision.  Multiple 
sensor  fusion  to  clear  up  ambiguities  present  in  ordinary  stereo  vision  of 
complex  3-D  scenes  might  be  called  "tomographic  vision."  This  is 
illus*'ated  in  Figures  3  and  4.  The  term  tomography  is  used  in  the  field 
of  x-ray  imaging  to  identify  various  multiple  source  geometry  techmcues 
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Figure  1.  A  Random-dot  stereogram  which  when  monocularly  viewed 
appears  as  an  aggregate  of  random  dots.  However,  when 
stereoscopically  fused  a  diamond  is  perceived  hovering 
over  the  random  background. 

(Taken  from  Bela  Julesz,  Foundations  of  Cyclopean  Perception 
p  xi . 


stereo  cannot  unamb 
ved  points  in  space 


for  obtaining  3-D  image  structure.  (For  a  discussion  of  tomography  see  8. 
G.  Ziedes  des  Plantes,  "Body-section  radiography:  History,  image 
information,  various  techniques,  and  results,"  Australas,  Radiol.,  Vol  15, 
pp  57-64  1971,  or  books  by  M.  Ter  Poggosian  or  H.  H.  Barrett  on  medical 
x-ray  imaging). 

Applicable  work  on  pattern  recognition  for  machine  vision  has  been  done  by 
the  optical  processing  community  for  many  years.  Recently,  an  alternative 
approach  based  on  neural  networks  has  also  shown  great  promise.  Beginning 
with  embryonic  ideas  in  the  United  States  on  the  "perceptron"  (F. 
Rosenblatt,  Principles  of  Neurodynamics),  workers  in  the  neural  network 
field  have  been  seeking  to  understand  how  neural  networks  can  perform 
pattern  recognition.  Fukushima  in  Japan  has  had  some  success  with  a  neural 
network  model  he  has  developed  under  the  name  "neocogni tron. " 

Another  approach  for  pattern  recognition  using  neural  networks  is  found  in 
the  associative  memory  work  of  Kohonen.  (Self-Organization  and  Associative 
Memory) . 

Both  the  work  of  Fukushima  and  of  Kohonen  shows  advantages  over  traditional 
matched  filtering  for  pattern  recognition.  The  difficulty  with  matched 
filtering  is  its  intolerance  to  object  distortion  or  aspect  changes. 
Fukushima  and  Kohonen  have  demonstrated  neural  network  models  which  can 
adapt  to  such  changes. 


2.0  MACHINE  VISION  AND  NATURAL  VISION 

The  performance  of  natural  vision  in  man  and  in  many  animal  species  exceeds 
the  capability  and  know-how  of  present  machine  vision  technology.  Our 
curiosity  is  tweaked;  what  are  the  processes,  forms  and  functions  of 
natural  vision  systems?  Specific  high  level  capabilities  of  natural  vision 
that  are  of  relevance  to  machine  vision  include: 
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•  detection  of  a  given  class  of  objects  in  the  presence  of  a 
disturbed  or  cluttered  background 

•  stereo  vision 

•  sensory  fusion,  e.g.,  cuing  of  vision  by  hearing 

•  reliable  performance  with  noisy  components  (neurons) 

Low  level  features  of  natural  vision  that  are  of  interest  include: 

•  retinal  readout  and  pre-processing  mechanisms 

•  dynamic  range  accommodation 

•  flat  fielding  to  achieve  uniform  response  from  non-uniform 

detectors 

In  the  quest  towards  machine  vision  systems  that  can  out-perform  natural 
systems,  knowledge  of  natural  vision  can  be  exploited  in  several  ways. 
Just  the  existence  of  human  vision  capabilities,  packaged  compactly,  and 
powered  efficiently  demonstrates  what  may  be  achieved,  and  motivates  our 
search  for  more  knowledge  about  vision. 

In  the  case  of  radar  vision,  the  existance  proof  is  provided  by  bats.  The 
sound-locating  apparatus  of  bats  weighs  only  a  fraction  of  a  gram,  yet  can 
do  the  job  of  radar  equipment  weighing  hundreds  of  kilograms.  Among  the 
sophisticated  functions  performed  by  bats  are  ranging  and  near  optimal 
pursuit  of  prey  (mosquitos,  moths);  Moving  Target  Indication  (MTI),  target 
discrimination,  accurate  direction  finding,  and  non-interfering  operation 
in  the  presence  of  a  great  number  of  other  bats  (Radar  Made  Easy,  by  M. 
Razmakhnin).  Behavorial  experiments  show  that  bats  can  "hear"  surface 
roughness  introduced  by  20-50  micrometer  deep  scratches.  Remarkably, 
laboratory  investigations  of  neuron  responses  in  the  bat  inferior 
colliculus  (one  of  the  way-stations  for  nerve  impulses  from  the  cochlea  to 
the  auditory  cortex)  have  shown  the  presence  of  neurons  that  can  detect  a 
15  microsec  time  lag  in  the  arrival  of  a  sound  pulse  at  the  left  and  right 
ears  (Gerhard  Neuweiler,  "How  bats  detect  flying  insects  "). 


Knowledge  of  machine  vision  can  assist  in  our  interpretation  of  natural 
vision,  and  vice-versa.  As  the  principles  underlying  natural  vision  are 
gradually  revealed,  machine  vision  workers  can  try  to  initiate  these 
principles.  The  actual  implementations  need  not,  however,  be  direct 
imitations;  once  the  processes  are  understood,  the  best  implementation  is 
likely  to  use  strategies  and  devices  that  have  eluded  biological  evolution. 

For  example,  present  work  in  artificial  intelligence  emphasizes  expert  based 
(knowledge  based)  systems  that  operate  in  specialized  subjects.  The  rules 
upon  which  such  systems  operate  are  combined  logically  in  "inference 
machines"  providing  chains  of  deductions  that  mimic  the  expert's  reasoning. 
This  approach  to  artificial  intelligence  (AI)  has  proven  useful  for  certain 
applications.  But  the  partitioning  of  the  AI  system  into  a  "rule  base"  and 
an  "inference  machine"  may  be  a  misleading  paradigm  for  actual  human 
thought.  In  fact,  there  is  evidence  from  work  on  associative  memories  that 
the  "rules"  humans  seem  to  apply  in  many  situations  may  actually  emerge 
from  the  accumulative  associations  of  stimuli  patterns  in  an  associative 
memory  (Anderson  and  Hinton,  p  21  Parallel  Models  of  Associative  Memory). 


Since  humans  carry  out  most  visual  processes  at  an  unconscious  level,  it  is 
likely  that  the  neural  architecture  supporting  vision  is  very  well  adapted 
to  this  task,  and  provides  a  good  architectural  basis  for  machine  systems 
to  imitate.  On  the  other  hand  human  capability  for  logical  or  deductive 
processes  is  not  impressive  compared  with  present  day  computer 
capabi 1 i ties ,  and  consequently  the  neural  architecture  supporting  conscious 
thought  may  not  be  a  good  candidate  for  imitation.  In  summary,  we  find 
motivation  for  the  comparative  study  of  natural  and  machine  vision  systems 
because  of  the: 

1)  existence  proofs  provided  by  human  vision  capabilities 

2)  complementary  understanding  derived  by  knowledge  of  both  machine 
and  natural  vision  systems 

3)  possibility  of  imitation,  either  at  the  algorithm  or 

archi tectura 1  level. 


■ 
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3.0  SENSOR  FUSION 


Natural  stereo  vision  is  the  most  familiar  example  of  sensor  fusion  by  a 
neural  network.  Other  examples  are  precise  direction  finding  in  bats 
through  binaural  hearing,  and  in  cats,  of  registration  in  the  visual  cortex 
of  neurons  mapping  visual  space  with  neurons  mapping  auditory  space.  (G. 
Neuweiler,  "How  bats  detect  flying  insects,";  also,  G.  Hinton  and  J. 
Anderson,  Parallel  Models  of  Associative  Memory,  p39). 

Algorithms  for  binocular  stereo  vision  have  been  developed  by  Marr  and  his 
colleagues,  and  are  presented  in  Marr's  book  Vision.  We  shall  concern 
ourselves  here  with  the  "zero-crossing"  algorithm,  which  is  applicable  for 
resolved  surfaces.  The  extraction  of  depth  information  from  a  pair  of 
stereo  images  involves  simple  geometry  once  corresponding  features  in  the 
left  and  right  images  are  identified.  The  problem  Marr  solved  was  to  find 
an  automatic  way  in  which  corresponding  features  could  be  paired  up, 
without  making  errors  in  the  presence  of  closely  spaced,  similar  looking 
features.  Marr's  algorithm  combines  two  strategies.  One  strategy  is  to 
look  for  edge-like  features,  and  to  ignore  grey-level  (image  intensity) 
structure.  The  edge-like  features  are  brought  out  by  applying  a 
second-order  differential  filter  to  the  images.  Sensitivity  to  edges 
without  regard  to  direction  is  achieved  by  using  a  symmetrical  differential 
operator,  the  Laplacian,  denoted  by  v2.  The  other  strategy  is  to  conduct 
the  search  for  corresponding  features  with  a  very  blurry  form  of  the  images 
first,  and  after  this  step  has  resolved  the  rough  depth  structure,  to  look 
for  more  corresponding  features  with  successively  less  and  less  blurred 
images,  until  the  full  resolution  is  reached. 

By  working  in  stages,  the  density  of  confusing  features  to  the  pair-up  is 
kept  low  enough  at  each  stage  to  avoid  mispairing  of  features  in  the  left 
and  right  images.  This  procedure  is  illustrated  in  Figure  5. 

An  optical  implementation  of  this  algorithm  seems  possible,  since  the 
blurring  and  zero-cross i ng  detection  are  linear  operations  that  can  be 
implemented  with  coherent  or  incoherent  optical  processors  (Goodman, 


Introduction  to  Fourier  Optics;  Stoner,  "Incoherent  optical  processing  via 
spatially  offset  pupil  masks,"  Applied  Optics  _T7 ,  p  2454  1978). 

A  diagram  of  the  zero-crossi ng  stereopsis  algorithm  is  shown  in  Figure  6. 
With  the  possible  exception  of  the  final  depth  disparity  computations,  the 
operations  can  be  implemented  optically:  gaussian  blurring,  zero-crossing 
detection  with  the  Laplacian,  and  cross-correlation  of  right  and  left 
zero-crossing  images  to  determine  depth  disparity. 

4.0  IMPLEMENTATION  OF  NEURAL  NETWORKS  FOR  PATTERN  RECOGNITION 

Implementation  of  neural  nets  can  be  made  at  the  functional  level  or  at  the 
architectural  level.  The  function  of  a  neural  net  might  be  equivalent  to 
iterative  matrix  vector  multiplication  and  thresholding.  Various 
electronic  or  optical  matrix-vector  multipliers  might  be  used,  in  either 
analog  or  digital  implementations.  Since  analog  optical  matrix  vector 
multipliers  have  been  developed  at  a  number  of  laboratories  (NOSC, 
Stanford,  Carnegie-Mel Ion,  Caltech)  we  shall  not  delve  into  this  approach 
here.  Rather,  we  shall  look  into  the  possibility  of  archi tectural  level 
implementations  of  neural  net  concepts.  One  such  architecture  supports 
associative  memories  which  can  perform  pattern  recognition  with  tolerance 
to  aspect  angle  and  noise.  These  properties  are  likely  to  be  important  to 
target  identification  and  discrimination  in  a  SDI  system.  See  Figures  7 
and  8. 

An  associative  memory  may  be  constructed  in  a  simple  way  following 
Stembuch  (see  Kohonen,  Self-Organization  and  Associative  Memory,  p  73). 
The  Steinbuch  Learning  Matrix  (Figure  9)  is  a  system  of  crossing  signal 
paths,  with  an  adaptive  connection  at  each  crossing.  The  learning 
mechanism  is  extremely  simple.  During  a  training  session  the  proper  output 
response  is  applied  externally  to  the  b-lines  for  each  of  a  set  of 
representative  input  stimuli  applied  to  the  e-lines.  Those  connections 
Detween  the  input  and  output  which  are  simultaneously  active  (positively 
correlated)  during  the  training  session  are  strengthened.  That's  all  there 
is  to  it.  This  learning  mechanism  of  strengthening  the  connections  that 
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Figure  7.  Demonstration  of  associative  recall  with  tolerance  to  aspect  cnange. 

Ten  classes  of  pictures  were  stored.  Within  each  class,  the  pictures 
are  images  of  a  face  taken  at  5  differenct  aspect  angles.  Part  (d)  ^ 

shows  a  test  image  of  the  person  in  (a)  and  (b),  taken  from  an  angle  c- 
not  originally  stored.  Part  (c)  shows  correct  classification  of  (d) 
by  the  associative  memory.  *;■ 

Taken  from  Kohonen,  Lehtio  and  Oja,  "Distributed  Associative  “.V 

Memory,"  in  Parallel  Models  of  Associative  Memory,  edited  by 
G.  Hinton  and  J.  Anderson. 
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Figure  8. 


Demonstration  of  autoassociati ve  recall  with  incomplete  or 
noisy  inputs.  Parts  (a)  through  (d)  show  4  of  100  stored 
images.  Parts  (e)  and  (b)  show  an  incomplete  input  which 
is  correctly  associated,  and  parts  (g)  and  (h)  show  a  noisy 
input  which  is  correctly  associated.  (Taken  from  Kohonen, 
lehtio  and  Oja,  "Distributed  Associative  Memory,"  in 
Parallel  Models  of  Associative  Memory,  edited  by  G.  Hinton 
and  J .  Anderson. 
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Figure  9.  The  Learning  Matrix.  Inputs  come  in  the  e-lines,  outputs  along 
the  b-lines.  The  connections  (shown  in  four  places  as 
resistors)  must  be  adaptible  for  learning  to  occur.  Training  is 
performed  by  forcing  the  b-lines  in  conduction  with  stimulus 
along  the  e-lines,  with  Hebbian  adaptation  of  the  connections. 
Thresholding  on  the  output  lines  is  indicated  by  the  triangular 
symbols.  Figure  taken  from  Kohonen,  Self-Organization  and 
Associative  Memory. 
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are  positively  correlated  is  "Hebbian"  since  Hebb  hypothesized  that 
biological  neural  networks  learn  by  strengthening  those  synapses  which 
connect  concurrently  active  neurons. 


It  has  proven  very  difficult  to  directly  study  synapse  growth.  However, 
there  is  evidence  that  nerve  growth  is  stimulated  by  electrical  current  and 
by  the  concentration  of  "nerve  growth"  chemicals  (Borgens  et  al,  "Enchanced 
Spinal  Cord  Regeneration  in  Lamprey  by  Applied  Electric  Fields,"  Science 
213  p  611  1981  and  Barnes  "What  Makes  Nerves  Regenerate,"  Science  230  p 
1024,  1985). 

The  practical  question  is:  What  sort  of  mechanisms  might  we  invent  or 
develop  to  realize  Hebbian  connections  in  an  optical  or  electronic  neural 
network?  Since  we  require  positive  correlation  in  the  activity  on  either 
side  of  a  synaptic  connection,  we  are  looking  for  a  synapse  weight  factor 
that  grows  as  <xy>  where  x  is  the  activity  on  one  side  of  the  synapse,  y  is 
the  activity  on  the  other  side,  and  the  brackets  <  >  indicate  averaging 
over  many  stimulus  response  cycles. 

(Of  course  there  are  other  possible  candidate  growth  factor  relationships, 
such  as  <f(x)  g(y)>  where  f{x)  and  g(y)  are  monotonic  functions,  but  the  xy 
term  would  enter  into  the  Taylor  series  expansion  of  f(x)  g(y),  and  so  we 
are  capturing  the  essential  factor  if  we  restrict  ourselves  to  <xy>.) 

Holography  provides  a  candidate  implementation  of  Hebbian  learning,  because 
the  formation  of  a  hologram  depends  on  the  product  xy  in  the  interference 
between  two  beams;  call  them  the  stimulus  beam  x  and  the  response  beam  y. 

Four-wave  mixing  materials  such  as  BSO  and  barium  titanate  might  be 
utilized  for  real-time,  adaptive  holographic  associate  memories.  See  the 
set-up  in  Figure  10  (J.  Huignard  et.  al.,  "Phase-conjugate  wavefront 
generation  via  real-time  holography  in  Bi^S^g  crystals,").  A  nonlinear, 
iterative  associative  memory  may  be  implememted  by  placing  the  linear 
holograpnic  memory  in  a  laser  cavity,  as  shown  in  Figure  11.  For  a  related 
concept,  see  0.  Pepper,  Scientific  American,  Jan  1986,  p  79. 
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Figure  10.  Application  of  real-time  holography  to  form  a  linear  associative 
memory 
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Figure  11.  Nonlinear  associative  memory  concept.  Stimulus/response  pairs 
are  stored  as  holograms  in  the  BSO  crystal.  The  phase  conjugate 
mirrors  and  the  holograms  constitute  a  laser  resonator,  with  the 
phase  conjugate  mirrors  providing  system  gain.  Mode  selection 
is  triggered  by  the  input  stimulus,  and  the  system  settles  into 
the  stimulus/response  mode  which  is  closest  to  the  input 
stimulus,  as  this  mode  has  the  Highest  local  maximum  in  system 
gain. 


In  analogy  to  the  formation  of  rock  connections  in  caves  between 

stalagtites  and  stalagmites,  we  can  imagine  connections  growing  between  the 
input  and  output  paths  of  a  Learning  Matrix  (Steinbuch).  Kohonen  mentions 
the  memistor  as  one  such  automatic  connection  forming  device.  Another 
(more  attractive)  possibility  mentioned  by  Kohonen  is  the  use  of  FET 
devices  as  variable  resistors  in  a  Learning  Matrix.  We  point  out  that 
programming  of  the  Learning  Matrix  could  then  be  achieved  optically,  using 
an  optically  addressed  FET  gate.  Here  we  seek  to  exploit  the  high 

sensitivity  of  the  FET  to  small  changes  in  gate  voltage,  and  the  attractive 
feature  of  low  cross-talk  between  the  optical  addressing  signals  and  the 
electronic  pulses  used  to  operate  the  Learning  Matrix.  A  slightly 

different  idea  is  to  use  a  slowly  responding  photoconductor  at  connecting 
nodes.  After  a  programming  exposure  with  an  appropriate  visible  light 
pattern,  a  photoconductor  such  as  cadmium  sulfide  will  retain  enhanced 

conductivity  for  many  electronic  readout  cycles  before  needing  to  be 
refreshed.  Of  course,  this  refesh  exposure  can  be  used  for  re-programming 
the  previous  connection  strengths.  (See  Figure  12). 


An  all  optical  implementation  can  be  based  upon  the  principle  noted 
earlier,  for  implementation  of  Hebbian  synapse  weights  through  a  growth 
mechanism  proportional  to  the  average  product  <xy>.  This  time,  instead  of 
a  holographic  principle,  we  use  a  photo-bleaching  effect  whicn  requires 
light  of  two  different  wavelengths  to  proceed:  xfl  and  x^.  One  possibility 
is  an  optical  pumping  system,  such  as  that  diagramed  in  Figure  13.  The 
initial  optical  connection  node  contains  material  in  a  metastable, 
absorptive  state.  Exposure  to  light  of  Xfl  and  x^  pumps  the  material  to  a 
state  which  decays  to  a  transparent  state. 


/- 
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Figure  12.  Learning  Matrix  implementation  concept.  The  matrix  connections 
shown  as  variable  resistors  are  made  out  of  a  slowly  responding 
photoconductor  like  cadmium  sulfide.  The  photoconducti ve 
connections  are"set"  by  visible  light  exposure  from  proximity 
coupled  LEDs,  during  a  training  session.  For  the  training 
session  the  switches  are  as  shown,  and  the  LEDs  are  forward 
biased  by  a  combination  of  input  simuli  and  forced  response 
potentials.  The  switches  are  reversed  for  operation,  leaving 
the  LEDs  back-biased. 


12-27 


Initial  state 
(absorbing) 


final  state  (transparent) 


Figure  13.  Optical  pumping  scheme  requiring  radiation  at  x  and  x  to 

a  b 

switch  media  from  absorbing  to  transparent  state. 
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TECHNICAL  TITLE:  OPTICAL  SYSTOLIC  ARRAY  PROCESSING  USING 

A  NOVEL  INTEGRATED  ACOUSTOOPTIC  MODULE 

ABSTRACT : 


During 

the  past 

six-month 

period  (July  1 

to  December 

31,  1985) 

very 

significant 

progress 

has  been 

made  in 

the 

experimental 

phase  of 

this 

research. 

Specifically,  the 

following 

two 

experimental 

projects 

were 

successfully  carried  out: 

1 .  Fabrication  and  Testing  of  Large  Linear  TIPE  Microlens  Array: 

A  linear  microlens  array  which  consists  of  60  lens  elements  each  with 
30  um  aperture  and  200  wn  focal  length  was  successfully  fabricated  using  the 
titanium-indif fused  prc  .on-exchanged  (TIPE)  technique  in  a  Y-cut  LiNbO^ 
substrate.  A  simple  matrix-vector  multiplication  experiment  was  also  repeated 
with  encouraging  results  using  the  AO  modulator  module  that  incorporated  this 
raicrolens  array. 

2.  Fabrication  and  Testing  of  Thermally-Annealed  Proton-Exchanged 
Channel  Waveguide  Cutoff  Modulator: 

An  electro-optic  cutoff  modulator  that  utilizes  a  single-mode  thermally- 
annealed  proton-exchanged  channel  waveguide  in  a  x-cut  LiNbO^  substrate  has 
been  realized  for  the  first  time.  In  contrast  to  the  earlier  cutoff  modu¬ 
lators  that  exclusively  utilized  titanium-diffused  channel  waveguides,  thermal 
annealing  was  used  to  provide  a  fine  tuning  on  the  refractive  index  changes 
that  was  essential  in  bringing  the  waveguides  to  the  very  edge  of  cutoff,  and 
thus  enabled  further  reduction  In  the  drive  voltage  requirement.  Thermal 
annealing  was  also  found  to  greatly  improve  both  linearity  of  modulation  and 
the  resistance  to  optical  damage.  For  example,  a  device  with  a  2-ym  channel 
width  and  a  5  mm  electrode  length  has  demonstrated  a  modulation  depth  as  high 
as  97%  at  a  total  voltage  swing  of  only  7  volts  and  a  very  high  linearity. 
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TECHNICAL  SUMMARY 


I. 


OBJECTIVES 


The  objeccive  of  this  ONR/SDI-sponsored  research  is  co  advance  the 
performance  characteristics  of  a  compact  integrated  acoustooptic  Rragg 
modulator  module  with  application  to  optical  svstolic  arrav  processing.  The 
integrated  acoustooptic  module  consists  of  a  single-mode  channel-planar 
composite  waveguide,  a  TIPE  microlens  arrav,  a  SAW  transducer,  and  a  TIPE 
integrating  lens  in  a  Y-cut  LiNh^  substrate.  The  :wj  research  tasks  to  be 
performed  are: 

1.  High-Speed  Electroootic  Modulation  of  the  Channel-Guided  Light 
Beams,  and 

2.  Determination  of  the  Channel  Capacitv  of  the  Integrated  Acoustooptic 
Module. 


II.  WORK  PERFORMED  .AND  RESULTS 


During 

the  past  six 

-month 

period  (July  1 

to  December 

31,  1985) 

verv 

significant 

progress  has 

been 

made  in  the 

experimental 

phase  of 

this 

research. 

Specifically, 

,  the 

following  two 

experimental 

projects 

were 

successfullv 

carried  out: 

1.  Fabrication  and 

Testing 

of  Large  Linear 

TIPE  Microlens  Arrav: 

A  linear  microlens  array  which  consists  of  60  lens  elements  each  with 
30  urn  aperture  and  200  pm  focal  length  was  successfullv  fabricated  using  the 
titanium-indif fused  proton-exchanged  (TIPE)  technique  in  a  Y-cut  LiNbO-^ 
substrate.  The  measured  performance  of  this  microlens  arrav  was  consistent 
with  that  of  the  smaller  arrays  (with  a  considerably  larger  lens  aperture  and 
a  much  longer  focal  length)  fabricated  previously.  A  simple  matrix-vector 
multiplication  experiment  involving  a  2  x  2  matrix  and  a  two-dimensional 
vector  was  also  repeated  with  encouraging  results  using  the  resulting  AO 
modulator  module. 

2 .  Fabrication  and  Testing  of  Thermally-Annealed  Proton-Exchanged 

Channel  Waveguide  Cutoff  Modulator; 


a. 


■ 

w>; 


We  have  designed  and  fabricated  the  first  electro-optic  cutoff 


x-cut  LiNM^  substrate.  In  contrast  to  the  earlier  cutoff  modulators  reported 
by  others  that  exclusively  utilized  titanium-diffused  channel  waveguides, 
thermal  annealing  was  used  to  provide  a  fine  tuning  on  the  refractive  index 
changes  that  was  essential  in  bringing  the  proton-exchanged  waveguides  to  the 
very  edge  of  cutoff,  and  thus  enabled  further  reduction  in  the  drive  voltage 
requirement.  Thermal  annealing  was  also  found  to  greatly  improve  the 
linearitv  of  modulation.  For  example,  a  device  with  a  2-um  channel  width  and 
a  5  mm  electrode  length  has  demonstrated  a  modulation  depth  as  high  as  97%  at 
a  total  voltage  swing  of  only  7  volts  and  a  very  high  linearity.  In  addition, 
no  optical  damage  has  been  observed  after  a  two-hour  continuous  exposure  of 
a  6328A  He-Ne  laser  light  at  an  intensity  as  high  as  10^  watt/cm^.  It  is 
important  to  note  that  this  simple  channel  waveguide  cutoff  modulator  will 
take  up  a  considerably  smaller  real  estate  and  provide  a  higher  degree  Ol 
linearity  in  the  output  light  intensity  than  the  other  existing  modulators. 

III.  CONCLUSIONS  AND  RECOMMENDATIONS 

The  experiment  on  the  linear  microlens  array  as  described  above  has  shown 
that  TIPE  is  a  viable  technique  for  fabrication  of  high-performance  planar 
waveguide  microlenses,  microlens  arrays,  and  their  combinations  in  LINbO^ 
substrates  using  a  single  masking  step.  Using  the  30  ym  lens  aperture  as  the 
dimension  for  one  basic  channel,  the  channel  capacity  of  the  integrated  AO 
module  will  be  333  per  cm  along  the  SAW  propagation  path.  The  corresponding 
sequential  data  rate  for  the  SAW  Is  approximately  100-Mbits/S. 

In  regard  to  the  thermally-annealed  proton-exchanged  channel  waveguide 
cutoff  modulator  explored  in  this  research  program,  it  should  be  emphasized 
that  the  modulator  will  take  up  a  considerably  smaller  real  estate  and  provide 
a  higher  degree  of  linearity  in  the  output  light  intensity  than  the  other 
existing  modulators.  It  is  thus  viable  to  construct  such  cutoff  modulators  in 
an  array  configuration  to  facilitate  multichannel  operation  In  integrated-  and 
fiber-optic  communication,  computing,  and  signal  processing  systems.  For 
example,  such  a  cutoff  modulator  array  can  be  conveniently  Integrated  with  the 
AO  Bragg  modulator  module  In  a  LiNbO^  channel-planar  composite  waveguide 
referred  to  previously. 

In  view  of  the  encouraging  experimental  results  as  described  above,  we 
strongly  recommend  continuation  of  the  research  project.  The  specific  tasks 
recommended  for  theoretical  and  experimental  studies  are  listed  below: 
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Theoretical  Study: 

1.  Analysis  and  Design  of  TIPE  Microlenses  and  Lens  Array 

2.  Analysis  on  AO  and  EO  Bragg  Diffractions  in  Channel-Planar  Waveguide 
to  Determine  Ultimate  Performance  Such  As  Bandwidth,  RF  Drive  Power, 
Nonlinearity,  and  Dynamic  Range 

3.  Comparison  Between  AO  and  EO  Modulation  and/or  Multiplication 
Schemes 

4.  Identification  of  Existing  and  New  Architectures  and  Algorithms 

5.  Determination  of  Residual  Noise  and  Error  Rate 

Experimental  Study: 

1.  Perform  AO  Bragg  Diffraction  Experiments  at  Diode  Laser  Wavelength 

2.  Perform  Selective  Optical  Computing  Experiments  Using  the  Present  AO 
Modulation  Scheme  at  0.6238  ym  and  Diode  Laser  Wavelength 

3.  Carry  Out  Optical  Computing  Experiments  Using  EO  Modulation  Scheme 

4.  Further  Integration  of  the  Basic  Modulator  Module  With  Diode  Laser 
(or  Optical  Fiber)  Array,  Photodetector  Array,  and  CCD  Driver 

5.  Ultimate  Realization  of  Integrated  Optic  Computer  or  Processor 
Modules  and  Their  Performance  Characterization  on  Accuracy,  Speed, 
Power  Consumption,  Error  Rate,  etc. 

TECHNICAL  DISCUSSION 

1.  SINGLE-MODE  TIPE  MICROLENSES  AND  MICROLENS  ARRAYS 

This  principal  investigator's  group  recently  developed  a  new  and  simple 
method  which  utilizes  a  combination  of  titanium-indiffusion  (TI)  and  proton- 
exchange  (PE)  processes  for  formation  of  planar-waveguide  microlenses  and 
microlens  arrays  in  LiNbO^  substrates. ^ ^  Waveguide  lenses  have  been 
recognized  as  among  the  basic  components  in  Integrated  Optics^)  since  the 
inception  of  this  now  emerging  technology  because  of  the  high  expectations  for 
optical  communication,  computing,  and  information  processing  systems  to  be 
realized  In  a  single  waveguide  substrate'  Although  a  variety  of  planar 

waveguide  lenses  including  the  geodesic,  chirp-grating,  Fresnel,  and  Luneburg 
types  had  been  reported  in  the  literature,  all  of  these  lenses  had  thus  far 
beer,  fabricated  In  the  form  of  single  lenses  onlv.  Also,  a  common 
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characteristic  of  these  existing  lenses  has  been  the  difficulty  in  obtaining 
simultaneously  a  combination  of  all  desirable  lens  characteristics.  The  TIPE 
microlenses  and  linear  lens  arrays  fabricated  thus  far  have  demonstrated  a 
combination  of  desirable  properties  including  very  short  focal  length,  large 
numerical  aperature  (low  f-number),  very  small  focal  spot  size,  large  angular 
field  of  view,  and  low  optical  insertion  loss  at  the  6 3 28 S  He-Ne  wavelength^ 
For  fabrication  of  the  single-mode  microlenses  and  microlens  arrays  ^ ^  the 
well-established  TI  process^)  was  first  applied  in  a  Y-cut  LiNbO^  substrate 
to  form  a  planar  waveguide  that  supports  a  single  TE-Mode  and  a  single  TM-mode 
of  the  lowest  order.  Subsequently,  a  masking  material  such  as  Si^N^  with  a 
designed  lens  contour  was  deposited  on  the  TI  waveguide  (Figure  la).  The 

sample  was  then  immersed  in  molten  benzoic  acid  at  230°C  for  six  hours.  As  a 

result  of  the  selective  proton  exchange  (PE)^,  the  region  without  the 

masking  material  had  its  extraordinary  refractive  index  increased  by  as  much 
as  0.11  in  comparison  to  the  remaining  TI  region  (Figure  lb).  Consequently, 
this  TIPE  region  of  appropriate  contour  will  function  as  a  planar  waveguide 
lens.  We  have  shown  that  by  using  the  TIPE  method^  ^  a  combination  of 

microlenses,  microlens  arrays,  and  composite  lenses  can  be  formed  in  the  same 
substrate  using  a  single  masking  step^^.  For  example,  a  linear  microlens 
array  and  a  large-aperature  integrating  lens  were  fabricated  using  the  TIPE 
method  for  realization  of  an  integrated  acousto-optic  Bragg  modulator  module 
in  a  LiNbO^  channel-planar  composite  waveguide  (Fig. 

Our  ONR/SDI-sponsored  research  is  concerned  with  utilization  of  this 
novel  Integrated  AO  module  for  optical  systolic  array  processing.  It  is  to  be 
noted  that  "Multiplication"  and  "Addition"  are  the  two  basic  operations  for 

optical  systolic  array  processing.  In  this  case,  "Multiplication"  is  facili¬ 
tated  by  AO  Bragg  diffraction,  and  "Addition"  by  the  TIPE  integrating  lens. 
Thus  by  pulsating  the  data  sequences  separately  Into  the  multiple  input  light 
beams  and  the  SAW  high-speed  digital  filtering  as  well  as  matrix-vector  and 
matrix-matrix  multiplications  can  be  performed.  A  simple  experiment  on 
matrix-vector  multiplication  involving  a  2  x  2  matrix  and  a  two-dimensional 
vector  has  been  demonstrated  most  recently. ^ ^  A  variety  of  simple  and 
interesting  experiments  are  being  envisaged.  Note  that  in  such  experiments 
the  output  data  is  obtained  bv  interfacing  a  CCD  analog  shift  register  with 
the  photodetector  array.  A  number  of  variations  to  the  basic  configuration  of 
Fig.  2  are  also  possible.  For  example,  an  array  of  electro-optic  Bragg  dif¬ 
fraction  grating  is  being  fabricated  to  replace  the  grating  created  by  the  SAW. 
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SINGLE-MODE  THERMALLY-ANNEALED  PROTON-EXCHANGED 
CHANNEL  WAVEGUIDE  CUTOFF  MODULATOR  ARRAY 


There  exists  a  popular  demand  for  a  simple  optical  channel  waveguide 
modulator  which  possesses  the  following  desirable  attributes:  1.  intensity 
modulation  in  direct  proportion  to  the  modulation  voltage  with  a  large  dynamic 
range  (in  contrast  to  the  existing  modulators  such  as  the  Mach-Zehnder 
interferometer,  directional  coupler,  TIR  and  X-modulators  that  exhibit  a  sine- 
squared  voltage  dependence  of  the  output  intensity),  2.  low  drive  voltage 
and/or  power  requirement,  3.  small  real  estate  per  modulator  and  thus  a 
practical  substrate  size  for  realization  of  the  resulting  modulator  array  or 
arrays,  and  4.  simplicity  in  device  configuration  and  fabrication  process, 
and  thus  the  viability  of  manufacturing  technology  involved.  We  have  most 
recently  succeeded  in  fabrication  of  single-mode  channel  waveguides  in  X-cut 
LiNbO^  substrates  using  the  proton-exchanged  (PE)  process^\  To  the  best  of 
our  knowledge,  no  report  of  such  PE  channel  waveguides  has  appeared  in  the 
literature.  We  have  also  fabricated  the  cutoff  modulators  that  utilize  such 
PE  channel  waveguides  and  applied  thermal  annealing  to  them..  The  preliminary 
experimental  results  obtained  thus  far  have  shown  that  this  thermally-annealed 
cutoff  modulator  will  provide  the  desirable  attributes  listed  above. 

Electro-optically  induced  channel  guiding  and  cutoff  modulation  of  a 
light  beam  in  GaAs,  LiNbO-j  and  KNbO-j  substrates  have  been  previously 
reported.  The  more  recent  of  these  earlier  works  (8-10)  aj^  utilized 
titanium-indiffusion  (TI)  process  ^  for  fabrication  of  the  waveguides  in  the 
LiNbO^  substrates.  In  our  work  the  PE  process  ^  was  employed  to  fabricate 
single-mode  channel  waveguides  of  various  channel  width,  namely  2-,  3-,  and 
4- urn  in  X-cut  Y-propagation  LiNbO^  substrates.  We  have  designed,  fabricated, 
and  tested  a  number  of  cutoff  modulators  (See  Fig.  3)  using  such  PE  channel 
waveguides  and  have  obtained  very  encouraging  and  reproduceable  results.  It 
Is  to  be  noted  that  in  contrast  to  the  devices  previously  reported^®),  only  a 
single  uniform  section  of  channel  waveguide  is  involved  in  the  present 
device.  As  a  result,  design  and  fabrication  of  the  present  device  are 
considerably  simpler.  Propogation  cutoff,  and  thus  intensity  modulation,  are 
provided  through  electro-optical  control  of  the  extraordinary  refractive  index 
Ng  using  the  coefficient  ^33*  For  example,  Plot  A  in  Fig.  4  shows  the  output 
light  intensity  as  a  function  of  DC  drive  voltage  for  the  device  with  a  2-ym 
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channel  width  and  a  5mm  electrode  length  that  was  measured  prior  to  thermal 
annealing.  It  is  seen  that  a  voltage  swing  from  -10  volts  to  +6  volts  was 
required  to  switch  the  modulator  from  maximum  transmission  co  cutoff, 
resulting  in  a  modulation  depth  of  98%. 

We  have  found  it  possible  to  bring  the  waveguide  to  the  very  edge  of 
cutoff  by  controlling  both  the  exchange  time  at  a  fixed  exchange  temperature 
of  245°C  and  the  time  of  subsequent  thermal  annealing.  Consequently,  a 
further  reduction  in  the  drive  voltage  could  be  accomplished  by  bringing  the 
effective  refractive  index  of  the  guided  mode  to  practically  identical  to  the 
substrate  index.  This  was  facilitated  by  thermal  annealing  of  the  devices  at 
300°C  for  7.0  min.  in  accordance  with  the  fact  that  subsequent  thermal 
annealing  after  the  PE  process  would  cause  the  index  profile  to  shift  from  a 
stepped  distribution  to  a  graded  distribution  with  a  lower  index  on  the 
surface. Plot  B  of  Fig.  5  shows  the  experimental  demonstration  of  this 
drive  voltage  reduction  through  thermal  annealing  using  the  same  cutoff 
modulator  that  was  used  to  generate  Plot  A.  It  is  seen  that  a  97%  modulation 
depth  was  obtained  ac  a  total  voltage  swing  of  only  7  volts,  namely,  from  -4 
to  +3  volts.  This  Dlot  also  shows  a  distinct  improvement  in  the  linearity  of 
the  modulation  after  thermal  annealing.  This  important  experimental 
observation  may  imply  that  the  scattering  loss  within  the  modulation  range 
became  more  uniform  after  heat  treatment.  Another  major  benefit  of  thermal 
annealing  is  elimination'  '  of  the  instability  in  the  guided  mode  index  that 
was  often  observed  in  the  PE  waveguides.'  '  This  enhancement  in  waveguide 
stability  through  annealing  was  also  confirmed  in  the  PE  cutoff  modulator  just 
described.  Finally,  in  contrast  to  the  TI  devices  that  often  suffer  from  the 
photoref ractive  effect  and  the  concomraittant  optical  damage  at  a  relatively 
low  light  Intensity,  resistance  to  the  optically  Induced  refractive  index 
Instability  In  our  PE  devices  was  found  to  be  greater.  For  example,  no 
optical  damage  was  observed  even  after  a  two-hour  continuous  exposure  of  the 
6328ft  wavelength  He-Ne  laser  at  a  light  intensity  as  high  as  lO^W/cm^.  This 
intensity  threshold  is  at  least  one  order  of  magnitude  higher  than  that  for 
the  TI  devices. 
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ABSTRACT 

This  research  program  addressed  fundamental  performance  issues  and  tradeoffs  of 
a  variety  of  optical  processing  and  computing  components,  such  as  spatial  light 
modulators,  photorefractive  volume  holographic  optical  elements,  and  threshold  arrays 
Such  issues  were  examined  in  the  context  of  the  incorporation  of  such  components  in 
optical  processing  and  computing  systems,  in  order  to  establish  predicted  performance 
boundaries  based  where  possible  on  physical  principles. 


OPTICAL  COMPUTING  COMPONENTS: 
FUNDAMENTAL  ISSUES 


TECHNICAL  SUMMARY 


I  OBJECTIVES 

This  program  addresses  a  number  of  critical  issues  that  potentially  define  and 
delimit  the  implementation  of  optical  processing  and  computing  systems. 

Recently,  tremendous  advances  have  occurred  in  the  development  of  potential 
algorithms  and  architectures  for  dimensionally  large  computational  problems  that  are 
particularly  well  suited  to  parallel  processing.  These  suggested  algorithms  and 
architectures  have  for  the  most  part,  however,  been  developed  without  careful  attention 
to  constraints  imposed  by  fundamental  physical  effects  that  bound  implementable  device 
performance. 

The  past  few  years  have  also  witnessed  dramatic  growth  in  available  component 
technologies.  These  include  advances  in  bulk  wave  acoustooptic  modulators,  surface 
acoustic  wave  devices,  novel  one-  and  two-dimensional  spatial  light  modulators,  new  and 
improved  photorefractive  materials  for  volume  holographic  optical  elements,  multiple 
quantum  well  and  superlattice  structures,  and  bistable  optical  devices. 

Recently,  several  research  groups  have  begun  to  examine  the  implications  of  the  at 
times  conflicting  requirements  placed  on  optical  information  processing  and  computing 
components  by  proposed  computational  architectures  and  their  associated  algorithms,  and 
also  on  the  architectures  by  available  components  [1,  2,  3,  4],  This  type  of  analysis 
proceeds  by  examining  the  ultimate  limits  of  system  performance  achievable  by  various 
architectural  arrangements  of  presently  available  as  well  as  potentially  available 

components.  These  limits  derive  from  power  dissipation,  thermal  noise,  input  power, 
quantum  uncertainty,  material  nonlinearity,  and  detection  signal-to-noise  considerations 
The  perspective  taken  here  is  that  accelerated  research  and  development  of  optical 
information  processing  and  computing  systems  will  be  most  effective  if  continued 
research  is  focused  on  physically  realizable  components  with  tractable  technological 
hurdles. 

During  the  research  program,  we  proposed  to  study  a  wide  range  of  potential 
algorithms,  architectures,  and  components  with  the  goal  of  identifying  eventual 

performance  boundaries  that  are  based  on  fundamental  physical  limitations.  The  program 
focus  was  such  as  to  elucidate  significant  avenues  of  opportunity  for  both  optical 
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processors/computers  and  hybrid  optical  interconnect/VLSI  dynamically  reconfigurable 
machines. 

II.  DESCRIPTION  OF  THE  RESEARCH  EFFORT 

The  research  performed  under  this  contract  consisted  of  two  principal  components, 
as  outlined  in  the  statement  of  work.  The  first  involved  technical  interactions  with  other 
subcontractors  at  several  individual  and  group  meetings  (including  the  1985  Annual 
Meeting  of  the  Optical  Society  of  America  (Washington,  D.  C„  October,  1985),  with 
participation  in  the  UDRI  Program  Review).  These  interactions  were  focused  on 
elucidating  optimized  goals  for  continued  and  accelerated  research  in  optical  processing 
and  computing  algorithms,  architectures,  components,  and  materials.  The  second 
principal  component  comprised  research  on  fundamental  physical  limitations  to  device 
performance,  with  results  described  in  this  section. 

The  scope  of  the  effort  was  limited  at  the  outset  to  the  study  of  problem  definition 
and  feasibility  of  approach.  Nonetheless,  several  important  results  were  obtained  even  in 
this  preliminary  phase,  as  outlined  below. 

Four  principal  areas  of  investigation  were  identified  as  of  critical  importance  to  the 
eventual  implementation  of  optical  processing  and  computing  systems:  spatial  light 
modulators  (both  optically  and  electrically  addressed),  dynamically  programmable  volume 
holographic  optical  elements,  threshold  arrays,  and  absolute  limits  imposed  by  the  physics 
of  computation. 

The  investigation  of  spatial  light  modulator  limitations  focused  on  fundamental 
effects  that  are  technology-independent.  Of  primary  importance  is  consideration  of  the 
quantum  fluctuations  characteristic  of  incident  illumination,  and  the  concomitant  effects  of 
such  fluctuations  on  the  tradeoffs  allowable  among  spatial  resolution,  dynamic  range,  and 
frame  rate.  It  was  found  that  for  the  case  of  binary  operations,  such  limitations  are  quite 
similar  to  those  encountered  in  traditional  electronic  computing  machines,  with  the 
exception  that  the  energy  cost  per  photon  is  typically  larger  than  the  energy  cost  for 
transport  of  the  corresponding  electron.  On  the  other  hand,  the  case  of  analog  (high 
dynamic  range)  computation  presents  an  interesting  situation  in  which  quantum 
fluctuations  play  a  fundamentally  important  role  in  establishing  minimum  performance 
boundaries  [4],  These  performance  boundaries  are  surprisingly  positioned,  and  thus  have 
implications  for  device  design  that  have  not  heretofore  been  taken  into  consideration. 

We  have  now  extended  this  concept  to  include  the  cases  of  mutual 
coherence/incoherence  between  source  and  detector,  and  are  in  process  of  applying  the 
p  incipfes  of  coherent  detection  from  communication  theory  to  the  optical 
processing/computing  case  of  area-detection.  It  should  be  pointed  out  that  the  process 
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of  incoherent-to-coherent  conversion  is  intrinsically  a  detection,  with  the  spatial  light 
modulator  itself  talcing  the  place  of  the  detector.  A  second  equivalent  situation  occurs 
when  the  input  signal  is  detected  either  electronically  or  optically  on  a  detector  (either 
serial  or  parallel),  and  is  subsequently  sequenced  onto  the  SLM  by  either  electronic  or 
optical  encoding  techniques. 

The  use  of  photorefractive  materials  for  the  fabrication  of  dynamically 
programmable  volume  holographic  optical  elements  has  received  considerable  attention 
for  a  wide  range  of  applications  in  optical  processing  and  computing.  It  is  therefore  of 
considerable  interest  to  examine  potential  fundamental  limitations  of  this  technology  that 
are  not  material-dependent.  This  investigation  is  quite  timely,  as  intensive  studies  of 
photorefractive  materials  have  been  undertaken  recently,  and  such  studies  are  time- 
consuming,  difficult  to  perform  reliably,  expensive,  and  in  most  cases  somewhat 
investigator-dependent.  During  the  contract  period,  we  have  developed  an  extensive 
interactive  computer  model  of  the  holographic  grating  formation  process,  with  capability 
for  inclusion  of  a  wide  range  of  material  characteristics,  exposure  parameters,  physical 
mechanisms,  polarization  effects,  and  volume  diffraction  characteristics.  This  model  has 
allowed  us  to  explain  all  observed  characteristics  of  photorefractive  materials  such  as 
bismuth  silicon  oxide,  and  to  further  predict  a  number  of  fundamental  constraints. 
Several  theorems  have  been  proven  regarding  the  nature  of  diffraction  from  birefringent 
phase  gratings  in  the  Bragg  regime,  particularly  with  regard  to  the  resultant  polarization 
states  (which  are  of  considerable  interest  from  signal-to-noise  considerations  for 
orthogonalization  of  the  incident  and  diffracted  beams). 

Recently,  we  have  shown  that  the  inclusion  of  constant  velocity  gratings,  which 
have  received  considerable  attention  for  application  to  phase  conjugate  image 
amplification,  is  likely  inapplicable  to  the  cases  desired  for  most  optical  processing  and 
computing  applications.  In  fact,  we  have  shown  that  as  the  modulation  index  of  the 
writing  beams  (i.  e.  of  the  stored  grating)  approaches  unity,  the  stationary  grating  case 
provides  significantly  larger  diffraction  efficiency  than  the  constant  velocity  grating  case. 
This  result  has  significant  implications  for  the  utilization  of  transient  gratings  in  system 
designs  incorporating  photorefractive  materials. 

Threshold  arrays  are  needed  in  optical  processing  and  computing  for  a  wide  range 
of  applications,  with  a  concomitant  wide  range  of  required  performance  parameters.  In 
particular,  while  bit-oriented  optical  computers  may  require  ultra-low  power  switching 
arrays  operating  at  relatively  high  frame  rates,  associative  memory  processors  may  have 
considerably  relaxed  requirements  on  switching  energy  per  pixel  and  frame  rate.  During 
the  contract  period,  we  have  examined  the  power  consumption  requirements  of  all 
reported  threshold  array  elements,  and  find  that  they  are  in  all  cases  to  date  too  high  for 
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potential  inclusion  in  a  bit-mapped  parallel  digital  optical  computer  that  is  competitive 
with  currently  available  electronic  technology.  A  key  result  is  that  such  performance 
boundaries  focus  increased  importance  on  the  complexity  of  interconnections  designed 
into  each  stage  of  the  processor  The  higher  the  level  of  interconnection  complexity  per 
iteration  (bit  plane),  the  greater  the  chance  that  such  systems  will  prove  feasible. 

Ouring  the  contract  period,  we  have  examined  in  considerable  detail  the  implications 
for  optical  processing  and  computing  of  recent  assertions  concerning  negligible 
computational  energy  consumption  in  reversible  computational  elements  [5,6,7].  Although 
we  are  in  full  agreement  that  such  systems  are  in  fact  theoretically  achievable,  an 
important  conclusion  has  emerged  from  our  study.  A  basic  tenet  of  the  reversible 
machine  is  an  adiabatic  approach  to  each  binary  decision.  This  implies  not  only  a  large 
uncertainty  as  to  the  time  constant  over  which  the  decision  is  registered,  but  also 
intrinsically  large  time  constants  for  even  a  minimum  decision  time.  Therefore,  it  can  be 
stated  that  the  energy  cost  of  computation  so  often  quoted  previously  is  not  in  fact 
inherent  in  the  nature  of  computation,  but  is  in  fact  inherent  in  deterministic 
computations  (in  which  the  decision  to  be  rendered  is  stabilized  against  uncertainties  in 
the  shortest  possible  time).  Thus  the  energy  cost  of  computation  can  only  be  calculated 
within  the  context  of  a  particular  architecture,  and  within  carefully  stated  performance 
goals  for  overall  processor  speed. 

III.  CONCLUSIONS  AND  RECOMMENDATIONS 

From  the  results  of  this  preliminary  research  effort,  we  conclude  that  numerous 
performance  limitations  will  accrue  to  eventual  optical  processing  and  computing  systems 
that  derive  principally  from  inherent  limitations  in  materials,  devices,  and  system 
architectures.  In  particular,  a  number  of  such  limitations  were  derived  for  spatial  light 
modulators,  photorefractive  materials  and 

dynamically  programmable  volume  holographic  optical  elements,  and  threshold 
arrays.  In  addition,  the  energy  of  computation  assignable  to  a  given  architecture 
implemented  with  a  given  set  of  components  can  be  directly  determined  from  the  stated 
performance  goals  of  the  overall  system. 

The  principal  recommendation  of  this  study  effort  is  straightforward.  As  a 
community  of  optical  processing/computing  scientists  and  engineers,  we  must  pay  strict 
attention  to  fundamental  limitations  in  the  generation  of  optimistic  system  performance 
expectations.  In  addition,  much  work  remains  to  be  done  in  the  elucidation  of  these 
fundamental  performance  boundaries.  Continued  interaction  between  device/materials 
scientists  and  algorithm/architecture  scientists  will  be  crucial  to  establishing 
implementable  systems  concepts  with  a  viable  component  technology  base. 
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